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(4)  Introduction 

Malaria  continues  as  a  major  health  threat  throughout  the  tropical  world  and 
potential  demand  for  antimalarials  is  higher  than  for  any  other  medication  yet  the  world 
faces  a  crisis-drug  resistance  is  emerging  and  spreading  faster  than  drugs  are  being 
developed  and  the  flow  in  the  pipeline  of  new  drugs  has  all  but  stopped.  This  represents 
a  particular  threat  to  the  US  Military.  In  a  short  time  there  may  be  parts  of  the  world 
where  no  effective  antimalarial  drug  is  available.  The  recent  emergence  of  multidrug 
resistant  malaria  parasites  has  intensified  this  problem.  Recognizing  this  emerging  crisis, 
it  is  necessary  to  identify  new  strategies  for  the  identification  and  development  of  new 
antimalarials.  The  goal  of  this  work  is  the  development  of  a  framework  for  antimalarial 
drug  development  into  the  21st  century. 

A  new  strategy  for  drug  development  is  urgently  needed.  Current  drugs  are  based 
on  a  small  number  of  target  molecules  or  lead  compounds  and  in  most  cases  the  target  of 
drug  action  is  yet  to  be  identified.  Resistance  is  emerging  rapidly  and  the  mechanisms  of 
resistance  are  poorly  understood.  The  identification  of  new  targets  or  new  candidate 
drugs  based  on  an  understanding  of  the  parasite  biology  are  key  elements  in  this  new 
strategy.  Clearly  the  development  of  a  new  antimalarial  will  require  both  basic  and 
applied  research  working  in  concert  with  one  another. 

The  goal  of  this  work  is  to  use  a  molecular  genetic  approach  both  in  the 
identification  of  new  drug  targets  and  in  the  investigation  of  mechanisms  of  drug 
resistance.  There  are  two  parallel  approaches  being  developed,  one  the  development  and 
characterization  of  a  homologous  transformation  system  and  two  the  development  of  a 
heterologous  expressions  system  in  yeast  for  potential  drug  target  enzymes.  The  yeast 
expression  system  should  allow  rapid  screening  of  new  drugs,  greatly  increasing  the  rate 
at  which  new  antimalarials  can  be  tested  and  developed.  Both  of  these  approaches  are 
based  on  the  functional  analysis  of  malaria  genes  with  goal  of  using  this  information  in 
the  identification  and  development  of  new  antimalarial  drugs.  The  development  of  these 
tools  should  facilitate  future  drug  development  and  allow  us  to  translate  our  molecular 
genetic  knowledge  into  the  practical  identification  and  development  of  new  antimalarials. 
This  is  a  new  strategy  and  it  is  being  applied  because  of  the  crisis  facing  us  in  antimalarial 
drugs.  The  previous  strategy,  namely  lead  directed  screening  must  be  supplemented  by 
new  strategies  or  we  will  be  faced  with  multiresistant  Plasmodium  falciparum  and  no 
drugs  to  treat  it. 

Malaria  represents  a  major  and  increasing  threat  to  the  U.S.  Military.  Many  of  the 
sites  of  current  or  potential  U.S.  Military  involvement  are  endemic  for  malaria  and  in 
several  sites,  multidrug  resistant  P.  falciparum  represents  a  major  problem  especially  for 
non-immune  military  personnel.  Current  drugs  available  to  the  U.S.  Military  are  quickly 
losing  their  effectiveness  because  of  emerging  and  spreading  drug  resistance.  This  work 
is  directed  both  at  identifying  new  drugs  and  drug  targets,  but  equally  importantly  toward 
an  understanding  of  drug  resistance  mechanisms  with  the  goal  of  preventing  or 
overcoming  drug  resistance  in  the  malaria  parasite. 
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(5)  Body 


During  the  grant  period,  the  research  has  focused  on  the  two  objectives,  namely 
the  analysis  of  critical  genes  in  the  Plasmodium  falciparum  for  their  role  in  drug 
resistance  and  as  potential  new  drug  targets  using  both  the  homologous  P.  falciparum 
system  and  the  heterologous  yeast  system.  We  have  initiated  experiments  during  this 
grant  period  which  take  a  alternate  technical  approach  to  achieve  the  goals  in  our 
statement  of  work  and  represent  applications  of  new  technology  which  did  not  exist  at  the 
time  of  our  original  planning  process.  These  include  the  analysis  of  gene  expression  in 
response  to  drug  treatment  using  the  method  of  Serial  Analysis  of  Gene  Expression  and 
the  use  of  DNA  Chip  technology  in  the  analysis  of  the  yeast  heterologous  system.  These 
approaches  complement  ongoing  work  and  will  provide  us  with  new  insights  into  drug 
resistance  and  provide  excellent  tools  for  the  identification  of  potential  new  drug  targets. 
Summaries  of  the  ongoing  work,  including  recent  data  are  included  for  each  of  the 
projects.  This  report  is  for  the  entire  grant  period  and  includes  information  from  previous 
annual  and  other  interim  reports. 

5.1  Functional  analysis  of  putative  drug  resistance  genes  and  new  drug  target 
genes  in  the  malaria  parasite  through  the  further  development  of  a 
transformation  system  for  the  malaria  parasite  including: 

1 .  Development  of  methods  to  express  and  modify  parasite  genes 

2.  Development  of  methods  for  targeted  gene  disruption 

The  overall  goal  of  this  work  is  to  understand  gene  expression  the  parasite,  in 
particular,  the  expression  of  genes  critical  for  drug  response  and  resistance.  This  work 
will  also  lead  to  the  development  of  methods  to  identify  critical  genes  as  future  drug 
targets.  One  of  the  key  obstacles  hindering  our  progress  in  this  work  is  a  fundamental 
understanding  of  gene  expression  in  the  parasite  and  this  has  limited  our  ability  to 
manipulate  the  organism.  Another  obstacle  had  been  the  limited  number  of  genes  that 
had  been  examined  in  the  parasite.  Progress  in  the  Plasmodium  falciparum  genome 
project  and  development  of  new  technology  has  provided  an  opportunity  to  overcome 
these  obstacles  in  the  parasite.  We  have  now  initiated  a  project  to  analyze  gene 
expression  in  Plasmodium  falciparum  using  the  newly  developed  method  of  Serial 
Analysis  of  Gene  Expression.  This  work  is  being  done  in  close  collaboration  with  Dr. 
Keith  Martin,  WRAIR. 

Background 

The  Plasmodium  falciparum  Genome  Project  has  opened  new  approaches  to  drug 
target  identification  and  through  a  functional  analysis  of  whole  genome  expression,  new 
drug  targets  will  be  identified.  The  overall  goals  of  this  work  are  to  use  the  knowledge 
derived  from  understanding  the  profile  and  mechanism  of  gene  expression  to  identify 
novel  targets  for  drug  development.  Approximately  60%  of  the  predicted  genes  in  the 
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Plasmodium  falciparum  genome  do  not  yet  have  an  identified  function  and  among  these 
will  be  genes  critical  for  parasite  survival  and  function.  By  using  a  functional  genomics 
approach  to  analysis,  we  hope  to  identify  new  classes  of  genes  which  we  cannot 
necessarily  predict  based  on  homologies  with  genes  identified  in  other  systems  or 
predicted  based  on  common  metabolic  pathways.  These  pathways  may  also  help  us  in 
identifying  new  targets  for  drug  development 


Genomes  to  Drugs  -  Opportunities  to  discover  new  drug  targets 


In  order  to  achieve  this,  a  more  fundamental  understanding  of  parasite  biology  is 
needed.  Great  progress  has  been  made  in  the  sequencing  of  the  Plasmodium  falciparum 
genome  with  two  chromosomes  assembled  and  annotated  (Gardner  et  al.  1998;  Bowman 
et  al.  1999)  and  high-throughput  sequencing  analysis  complete  for  over  80%  of  the 
genome  at  the  three  genome  sequencing  centers,  The  Institute  for  Genome  Research 
(TIGR),  the  Sanger  Centre  and  the  Stanford  Genome  Center.  Relating  genomic  sequence 
to  function  and  ultimately  malarial  biology  is  the  next  logical  step.  One  approach 
involves  investigating  transcriptional  profiles  in  the  parasite  at  the  level  of  the  entire 
genome.  By  understanding  the  network  of  genes  expressed  by  an  organism,  complex  and 
interrelated  cell  functions  can  begin  to  be  unraveled.  Much  previous  work  has  focused  on 
single  genes,  yet  many  biological  processes  are  the  result  of  interactions  of  multiple  genes 
and  gene  products.  Such  global  transcriptional  studies  are  a  first  step  to  identifying 
participants  in  such  complex  processes  as  response  to  drug  treatment.  By  investigating 
gene  expression  on  a  genome-wide  scale,  we  hope  to  discover  key  features  of  the 
parasite’s  biology  and  discover  new  targets  that  could  lead  to  novel  drug  development. 

Serial  analysis  of  gene  expression  (SAGE)  is  particularly  well  suited  for  malarial 
systems,  as  little  is  known  about  gene  expression  and  many  of  the  genes  identified  in  the 
sequencing  project  are  of  unknown  function.  SAGE  is  an  extremely  powerful  tool  with 
which  to  simultaneously  and  quantitatively  analyze  mRNA  transcript  profiles  from  a 
given  cell  population,  allowing  for  the  discovery  of  new  genes.  For  the  first  time,  it  is 
now  possible  to  examine  the  response  of  all  of  the  genes  to  stimuli  such  as  drug 
treatment.  Techniques  have  been  developed  in  other  systems  and  adapted  to  malaria 
including  differential  display  (Liang  et  al.  1992)  (Thelu  et  al.  1994),  microarray  analysis 
(Lashkari  et  al.  1997)  and  Serial  Analysis  of  Gene  Expression,  SAGE  (Velculescu  et  al. 
1995)  (Hayward  et  al.  2000). 

Serial  Analysis  of  Genes  Expression 

SAGE  is  well  suited  to  an  organism  whose  genome  is  not  completely  annotated 
and  provides  an  open  platform  for  new  gene  discovery.  SAGE  allows  the  discovery  of 
new  genes,  as  well  as  the  detection  of  low  abundant  transcripts  by  qualitatively  and 
quantitatively  analyzing  thousands  of  transcripts  in  a  given  cell  population  at  the  same 
time.  The  technique  is  based  on  three  experimentally  confirmed  principles  (Velculescu  et 
al.  1995):  a)  a  short  (lObp)  tag  from  a  defined  position  within  a  transcript  can  uniquely 
identify  a  gene.  This  is  reasoned  by  the  fact  that  the  maximum  number  of  possible  tag 
sequences,  assuming  a  random  nucleotide  distribution  (410=  1,048,576),  is  far  greater  than 
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the  number  of  estimated  genes  in  most  organisms;  b)  concatenation  of  several  tags  into  a 
single  molecule  allows  for  efficient  sequencing  and  acquisition  of  data;  and  c)  expression 
patterns  of  induced  genes  are  accurately  represented  by  the  abundance  of  their 
corresponding  tags.  As  such,  SAGE  can  achieve  levels  of  transcript  profiling  that  have 
not  been  approached  by  differential  display,  subtractive  hybridization  and  EST  (expressed 
sequence  tag)  technologies  (Carulli  et  al.  1998). 

SAGE  has  been  successfully  applied  in  a  number  of  different  systems;  for 
example,  it  has  been  used  to  a)  characterize  the  entire  repertoire  of  expressed  transcripts 
in  yeast  (Velculescu  et  al.  1997);  b)  identify  p53  regulated  genes  (Madden  et  al.  1997) 
(Polyak  et  al.  1997);  c)  compare  differential  gene  expression  between  normal  human  and 
cancer  cells  (Zhang  et  al.  1997;  Hibi  et  al.  1998;  Hibi  et  al.  1999;  Lai  et  al.  1999);  and  d) 
profile  gene  expression  in  rice  seedlings  (Matsumura  et  al.  1999).  In  summary,  SAGE 
lends  itself  as  an  extremely  efficient  tool  for  qualitative  monitoring  of  global  gene 
expression. 

Global  gene  expression  responses  to  drug  treatment 

The  recent  availability  of  complete  genome  sequences  and  methodologies  to  scan 
whole  genomes  has  allowed  investigators  to  ask  questions  about  the  global  response  of 
cells  to  various  stimuli  (Schena  et  al.  1995;  Schena  et  al.  1996;  Heller  et  al.  1997; 
Velculescu  et  al.  1997;  Schena  et  al.  1998).  Much  of  the  initial  work  was  done  in 
Saccharomyces  cerevisiae  and  has  led  to  the  surprising  observation  that  over  100  genes 
change  in  expression  levels  when  cells  encounter  toxic  drugs  or  nutrient  levels  change 
(DeRisi  et  al.  1997;  DeRisi  et  al.  2000).  These  results  imply  that  response  to  drug 
treatment  involves  the  interaction  of  several  gene  products  and  several  pathways  may 
exist  by  which  the  cell  can  resist  the  toxic  effects  of  the  drug.  Over  time,  a  particular 
pathway  may  predominate  in  resistant  cells,  but  the  expression  of  other  proteins  may 
continue  to  play  an  important  role.  In  yeast,  more  than  20  genes  are  turned  on  after 
treatment  with  phorbol  ester  implying  specific  transcriptional  activation  and  thus  opening 
a  new  avenue  for  the  development  of  interventions.  A  specific  aim  of  this  project  is  to 
investigate  the  global  response  of  Plasmodium  falciparum  to  treatment  with  antimalarial 
drugs.  This  research  will  be  accomplished  using  the  methods  of  Sequential  Analysis  of 
Gene  Expression  (SAGE)  (Velculescu  et  al.  1995) 

A  second  and  related  question  concerns  the  mechanism  by  which  parasites  are 
killed.  Does  treatment  with  toxic  drugs  result  in  a  unique  gene  expression  pattern  or  do 
cells  during  the  course  of  the  response  to  drug  treatment  express  a  common  set  of  genes? 
Little  is  known  about  the  events  that  lead  to  cell  death  in  parasites.  Is  the  response  to 
each  drug  or  immune  mediator  unique  or  is  there  a  common  cell-death  pathway?  In 
higher  eukaryotes,  specific  pathways  are  involved  in  cell  death,  termed  apoptosis,  which 
can  be  stimulated  by  many  different  events  including  the  treatment  of  cells  with  toxic 
drugs.  The  requirement  of  programmed  and  orderly  cell  death  during  the  development  of 
multicellular  organisms  is  thought  to  be  the  origin  of  the  apoptotic  pathway.  In 
unicellular  organisms  such  as  Plasmodium  falciparum,  whether  such  a  pathway  exists 
remains  an  open  question.  Only  a  single  publication  in  the  literature  provides  evidence 
for  DNA  fragmentation  after  chloroquine  treatment,  consistent  with  an  apoptotic  pathway 
(Picot  et  al.  1997).  The  goal  of  this  work  is  to  explore  the  parasite’s  response  to  several 
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different  toxic  compounds,  including  immune  mediators,  and  examine  the  gene 
expression  profile  using  the  approach  of  whole  genome  scanning. 

One  of  the  overall  goals  of  this  work  is  to  use  the  knowledge  derived  from 
understanding  the  mechanisms  and  networks  that  control  gene  expression  to  identify 
novel  targets  for  drug  development.  Approximately  60%  of  the  predict  genes  in  the 
Plasmodium  falciparum  genome  do  not  yet  have  an  identified  function  and  among  these 
will  be  genes  critical  for  parasite  survival  and  function.  By  using  a  functional  genomics 
approach  to  analysis,  we  hope  to  identify  new  classes  of  genes  which  we  cannot 
necessarily  predict  based  on  homologies  with  genes  identified  in  other  systems  or 
predicted  based  on  common  metabolic  pathways.  At  least  one  class  of  genes  will  be 
involved  with  the  regulation  of  gene  expression  and  these  may  prove  to  be  unique  to 
Plasmodium  falciparum  and  provide  new  insights  into  novel  aspects  parasite  biology. 
These  pathways  may  also  help  us  in  identifying  new'  targets  for  drug  and  vaccine 
development. 

Exnerimental  Design  and  Methods 

We  have  been  successful  in  establishing  the  SAGE  technology  for  Plasmodium 
falciparum.  Our  results  are  summarized  below  and  provided  in  greater  detail  in  the 
Appendix  in  Munasinghe  et  al,  submitted  and  Patankar  et  al,  submitted).  We  have 
presented  results  demonstrating  the  feasibility  of  the  SAGE  methodology  applied  to  the 
Plasmodium  falciparum  asexual  stage  parasite  system.  This  sets  the  stage  for  examining 
global  gene  expression  profiles  under  a  variety  of  conditions  and  will  lead  both  to  the 
identification  of  networks  of  genes  coordinately  regulated  and  to  the  identification  of  new 
genes  and  pathways  critical  for  parasite  survival.  We  have  also  presented  results 
demonstrating  our  ability  to  functionally  analyze  cis-elements  hypothesized  to  be 
important  for  gene  regulation.  This  approach  will  be  critical  for  the  analysis  of  the  gene 
expression  networks  identified  by  the  SAGE  analysis.  In  the  proposed  work,  we  will 
extend  our  SAGE  analysis  to  analyze  the  parasites  under  conditions  of  biological 
relevance  and  then  use  that  information  to  identify  genes  or  groups  of  genes  for  further 
functional  analysis. 

Development  and  optimization  of  the  Serial  Analysis  of  Gene  Expression  (SAGE) 
technology  for  Plasmodium  falciparum 

We  have  demonstrated  in  the  preliminary  data  and  accompanying  detailed 
manuscript  the  feasibility  of  applying  the  SAGE  technology  to  Plasmodium  falciparum. 
Under  this  proposal  we  plan  to  use  this  technology  to  analyze  differential  global  gene 
expression  under  different  growth  conditions.  Additional  development  of  the  technology 
including  the  development  of  bioinformatics  support  for  the  technology  and  optimizing 
the  technical  aspects  of  the  methodology.  The  results  of  our  work  are  summarized  below. 
Additional  detailed  information  can  be  found  in  the  preprint  and  manuscript  (Munasinghe 
et  al,  2001,  Patankar  et  al,  submitted). 
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Genes  expressed  by  Plasmodium  falciparum  asexual  stage  parasites 

The  first  set  of  experiments  is  to  analyze  the  genes  that  are  expressed  by  the 
asexual  stage  parasites  using  the  SAGE  methodology.  Plasmodium  falciparum  (3D7) 
parasites  were  synchronized  and  harvested  at  the  trophozoite  stage  (70%  trophozoites) 
and  SAGE  analysis.  Trophozoites  were  chosen  as  the  first  target  for  SAGE  analysis 
because  of  the  large  body  of  evidence  that  in  the  asexual  blood  stages,  the  majority  of 
RNA  synthesis  occurs  during  the  trophozoite  stage.  In  subsequent  experiments, 
synchronized  parasites  will  be  isolated  at  different  stages  of  asexual  blood  cycle  and 
expression  profiles  will  be  compared.  In  the  initial  experiments,  we  analyzed 
approximately  7000  individual  tags  and  determined  their  abundance  in  the  tag  population 
(see  Table  1,  NB:  we  only  reported  the  abundance  of  those  tags  present  at  2  or  greater 
copies.). 

As  can  be  seen  from  this  data,  only  a  small  percentage  of  mRNAs  are  present  in 
high  abundance;  only  1 1  genes  are  found  in  the  highest  abundance  classes,  while  greater 
than  80%  of  the  tags  are  present  at  10  to  50  times  lower  levels.  In  addition,  those  tags 
present  at  2  copies  or  more  in  the  SAGE  library  correspond  to  1047  genes,  or 
approximately  1 5%  of  the  total  predicted  genes  in  the  parasite  genome.  If  the  single  tag 
data  is  included,  then  trophozoites  express  approximately  3500  genes,  or  about  50%  of 
the  total  predicted  genes.  This  data  is  quite  consistent  with  other  systems  such  as 
Saccharomyces  cerevisiae  where  about  3500  of  the  predicted  6500  genes  are  expressed 
during  vegetative  growth  with  less  than  1000  genes  expressed  at  high  abundance.  This 
implies  that  a  significant  number  of  genes  are  not  expressed  during  trophozoite  stage  of 
the  parasite,  although,  there  may  be  some  transcripts  that  do  not  contain  an  NlaDI  site  or 
which  are  expressed  at  very  low  levels  or  undergo  very  high  turnover. 


Table  1. 


Percentage 

Frequency* 

Total  number 
of  tags 

Total  number 
of  genes 

Matches  to  NCBI 
P.falciparum  database 

>2.2 

87  (2.4%) 

1  (0.1%) 

1  (0.1%) 

1. 1-2.2 

90  (2.5%) 

2  (0.2%) 

2  (0.2%) 

0.55-1.0 

226  (6.3%) 

8  (0.8%) 

7  (0.7%) 

0.28-0.53 

370  (10%) 

30  (2.8%) 

25  (2.4%) 

0.14-0.25 

632  (18%) 

105  (10%) 

78  (7.4%) 

0.06-0.11 

2196  (61%) 

901  (86%) 

ND 

Total 

3601  (100%) 

1047  (100%) 

ND 

The  tags  are  divided  into  abundance  classes  according  to  frequency  of  appearance 
among  3601  tags  comprising  the  expression  profile  of  the  3D7  control  population.  The 
number  of  tags  matching  to  an  entry  in  the  P.  falciparum  database  is  listed  per  abundancy 
class,  and  the  percentage  of  hits  among  1047  unique  tags  is  given  in  brackets. 


The  identity  of  the  genes  in  the  highest  abundance  classes  was  assigned  using  the 
BLAST  analysis  procedures  of  both  the  Genbank  database  entries  at  NCBI  and  the  high 
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throughput  shotgun  DNA  sequence  available  at  the  TIGR,  Sanger  and  Stanford  Genome 
Centers,  ha  addition  for  a  subset  of  the  analyses,  we  used  those  sequences  assembled  by 
Jessica  Kissinger  and  David  Roos  at  the  University  of  Pennsylvania.  As  can  be  seen  in 
Table  2  below,  several  that  are  expressed  genes  include 


Highly  expressed  genes 

Tag 

%  Abundance 

Gene  description 

TCAGGCGTTA 

1.3 

cytochrome  oxidase  (mitochondrial-gene) 

GAAGTCGAAA 

0.45 

5.8S  ribosomal  RNA 

ATTTGAAGCA 

0.42 

Rhop  H3 

GTAGTTGACA 

0.36 

hypothetical  protein 

CTAAAGCACC 

0.33 

ras-related  nuclear  protein 

TTGAAGCTGA 

0.28 

heat  shock  protein 

CGAGGAAAAA 

0.27 

serine  repeat  antigen 

AACGACAAGA 

0.25 

Pfg27/25 

CCAAATGATG 

0.25 

polyubiquitin 

TACAGCTGCT 

0.21 

merozoite  surface  protein 

GGGAAAGCGA 

0.19 

hypothetical  protein 

TTGAGGATTC 

0.19 

rifin 

GGAAAT  AAAG 

0.18 

unknown  protein 

Table  2:  Highly  expressed  genes  in  the  3D7  control  SAGE  library.  Tag  represents  the  lObp  SAGE  tag 
adjacent  to  the  Nlalll  site.  Gene  description  details  the  gene  corresponding  to  a  particular  tag.  Abundance  is 
listed  as  a  percentage  of  all  6702  tags  in  the  3D7  control  SAGE  library. 


known  as  well  as  several  matches  to  hypothetical  or  unknown  proteins.  This  unknown 
and  hypothetical  proteins  are  likely  to  represent  highly  expressed  genes  in  pathways  not 
yet  uncovered  by  traditional  approaches  and  may  represent  novel  parasite  pathways, 
critical  for  parasite  survival  and  growth.  Such  genes  may  point  to  new  pathways  to  target 
for  drug  development. 

Northern  Analysis  to  confirm  SAGE  data 

These  initial  experiments  have  demonstrated  the  feasibility  of  this  approach; 
however,  additional  data  will  be  needed  to  fully  validate  this  method  and  to  use  the 
method  for  further  analysis.  First,  the  abundance  level  indicated  by  the  SAGE  analysis 
will  need  to  confirmed  using  other  methods.  Our  first  approach  was  to  examine  several 
of  the  genes  using  the  method  of  quantitative  Northern  blot  analysis  and  this  analysis 
confirmed  the  SAGE  data  (see  Patankar  et  al,  submitted).  There  are  two  other  potential 
methods  that  could  be  used  to  compare  abundance  of  mRNA.  This  includes  the  use  of 
microarrays  and  we  will  collaborate  with  those  groups  to  compare  our  SAGE  data  with 
their  data.  In  addition,  for  certain  genes  where  the  exact  amount  of  mRNA  present  in  the 
sample  is  critical  to  further  experiments,  quantitative  PCR  methods  can  be  developed. 

Definition  of  the  Plasmodium  falciparum  Transcriptome 

A  second  major  aspect  of  this  specific  aim  is  to  create  a  new  SAGE  tag  library  in 
a  second  experiment  using  P.  falciparum  3D7  trophozoites.  This  will  allow  us  to 
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determine  the  reproducibility  of  the  tag  library  and  will  also  give  us  additional  tags  to 
include  in  the  analysis.  An  empiric  method  was  developed  by  Velculescu  and  coworkers 
(Velculescu  et  al.  1995)  to  determine  the  minimum  number  of  tags  necessary  to  describe 
what  they  termed  the  “Yeast  Transcriptome”.  They  measured  the  number  of  new  tags 
obtained  as  a  function  of  the  number  of  tags  analyzed,  and  determined  that  a  library  of 
15,000  tags  had  reached  saturation  in  terms  of  the  acquisition  of  new  tags.  In  a  similar 
analysis  using  our  initial  data,  we  have  determined  that  for  the  high  abundance  class  of 
tags,  a  library  of  2000  tags  was  adequate  for  these  tags  to  be  identified  and  to  be  sorted 
into  the  high  abundance  class.  If  we  analyze  total  tags  as  above,  then  we  have  not  yet 
reached  saturation  or  plateau  level  in  the  acquisition  of  new  tags  and  predict  based  on  our 
data  and  data  from  the  yeast  system,  that  we  will  need  approximately  15,000  tags  from  a 
single  library.  One  of  our  first  goals  will  be  to  generate  such  a  library  to  serve  as  the 
basis  of  comparison  for  all  of  the  other  work.  Results  from  this  library  will  be  posted  on 
the  web,  either  through  our  own  website  and/or  through  the  Plasmodium  falciparum 
Genome  Project  database. 

Once  the  SAGE  library  has  been  completed  for  the  trophozoite  stage,  we  will 
make  libraries  for  the  ring  and  schizont  stages  of  the  parasite  life  cycle.  Each  of  these 
libraries  will  be  made  using  synchronized  parasites  as  described  in  the  preliminary  data 
and  quantitative  analysis  of  Giemsa  stained  thin  blood  smears  will  be  used  to  assess  the 
purity  of  the  preparations. 

Annotation  of  SAGE  tag  data 

One  of  the  major  goals  of  this  work  is  to  understand  the  networks  or  groups  of 
genes  that  are  coordinately  expressed  by  the  parasite.  In  the  initial  experiments,  we  will 
determine  all  of  the  genes  expressed  during  the  trophozoite  stage.  This  should  give  us 
insights  into  metabolic  pathways,  expression  of  surface  molecules  and  identify  several 
unknown  genes  to  focus  on  for  further  analysis.  In  our  preliminary  data  we  have  begun 
this  analysis  for  the  tags  expressed  in  the  highest  abundance  classes  (shown  in  Table  2). 
We  will  continue  this  analysis  with  the  remainder  of  the  tags  and  more  importantly  with 
the  larger  3D7  SAGE  library.  This  process  will  require  some  additional  software 
development  since  the  majority  of  the  Plasmodium  falciparum  genome  sequence  is  not 
fully  annotated.  The  work  done  under  this  specific  aim  will  contribute  significantly  to 
annotation  of  library  with  regard  to  identifying  genes  with  regard  to  their  expression 
profile.  One  advantage  of  this  SAGE  library  is  that  the  once  the  data  is  collected,  it  will 
continue  to  be  useful  in  identifying  expressed  genes  as  more  of  the  genome  is  annotated. 

High  Abundance  Class  mRNAs 

Another  important  outcome  of  this  work  is  the  identification  of  key  metabolic 
pathways  in  the  parasite.  For  example,  the  most  abundant  tag  corresponds  to  a 
mitochondrial  gene,  cytochrome  oxidase.  Consistent  with  this,  Vaidya’s  group 
(Srivastava  et  al.  1999;  Srivastava  et  al.  1999)  has  identified  the  target  of  atovaquone,  the 
newest  antimalarial  drug  to  be  developed,  as  mitochondrial  cytochromes.  The  baseline 
data  provided  by  the  3D7  trophozoite  stage  SAGE  library  will  provide  us  with  a 
description  of  all  of  the  genes  expressed  by  the  parasite  in  this  metabolically  active  stage 
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of  the  parasite  life  cycle.  In  addition,  it  will  give  us  greater  insight  into  which  metabolic 
pathways  are  likely  most  active  in  the  various  stages  of  the  parasite  life  cycle. 

Global  Gene  Expression  Following  Drug  Treatment 

One  of  the  most  powerful  approaches  to  understanding  key  pathways  in  parasite 
biology  is  to  examine  the  changes  in  gene  expression  under  different  conditions.  Of 
primary  interest  to  our  laboratory  is  the  effect  of  drug  treatment  on  parasites.  This  both 
give  us  insights  into  putative  mechanisms  of  action  and  the  potential  of  identifying  new 
targets  in  existing  pathways.  To  initiate  this  work,  we  have  treated  Plasmodium 
falciparum  parasites  with  chloroquine  under  conditions  that  will  kill  the  parasite  over  the 
course  of  a  48-hour  treatment. 

5.2  Analysis  of  Gene  Expression  in  Plasmodium  falciparum 

As  the  work  on  the  global  analysis  of  gene  expression  is  ongoing,  we  have 
continued  our  efforts  to  understand  gene  expression  at  the  level  of  the  individual  gene  in 
order  to  develop  methods  to  modify  expression  using  molecular  genetics.  Again,  with  the 
additional  information  provided  by  the  genome  project,  we  have  been  able  to  make 
excellent  progress  on  this  work  and  have  made  a  preliminary  observation  which  indicate 
that  we  may  be  able  to  readily  identify  cis-acting  elements  which  control  gene  expression 
through  an  analysis  of  comparative  genomics. 

5’  Untranslated  Region  sequence  variation  in  strains  of  Plasmodium  falciparum . 

Transcriptional  regulation  has  not  been  well  defined  in  Plasmodium  falciparum. 
A  review  of  studies  that  have  looked  at  the  regulation  of  different  genes  indicate  that 
regulation  in  the  parasite  may  be  different  from  the  classic  model  of  regulation  in 
eukaryotes.  While  the  mechanism  may  be  different,  it  is  likely  that  transcriptional 
regulation  plays  an  important  role  in  genes  such  as  pfmdrl.  This  is  supported  by  the 
evidence  in  yeast,  where  PDR  genes  are  transcriptionally  regulated.  Thus  in  an  effort  to 
further  characterize  the  role  of  pfmdrl  in  drug  resistance,  we  have  begun  to  map  the 
5 ’untranslated  region  of  the  gene. 

A  contig  on  chromosome  5  containing  3D7  strain  pfmdrl  coding  and  noncoding 
sequence  has  been  extracted  from  the  Plasmodium  falciparum  genome  database.  The 
coding  region  of  this  contig  has  100%  identity  with  the  coding  sequence  of  the  DIO  strain 
pfmdrl  clone  in  Genbank  (gi:9935).  However,  the  5’UTR  of  the  genes  only  has  60% 
identity.  This  reduction  in  identity  occurs  in  a  pattern  of  complete  homology,  followed 
by  minor  variation,  followed  by  major  variation  at  the  most  upstream  end  of  the  3D7  and 
DIO  sequence.  This  pattern  may  be  indicative  of  selective  pressure  in  the  gene. 

We  were  interested  to  see  if  this  pattern  of  increasing  variation  was  present  in 
other  P  falciparum  genes.  A  comparison  of  upstream  sequence  of  the  3D7  and  T9/96  (gi: 
160127)  P.  falciparum  calmodulin  genes  revealed  a  similar  pattern.  Of  course,  the 
possibility  of  cloning  artifacts  and  sequencing  errors  must  be  ruled  out  before  the 
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significance  of  this  result  can  be  determined.  Both  sets  of  sequences  were  aligned 
utilizing  the  Clustal  W  alignment  tool.  They  were  anchored  at  the  putative  translational 
start  site,  and  matched  in  length  to  minimize  gaps.  3D7,  the  reference  strain,  is 
highlighted  in  bold  for  both  alignments. 
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SEQUENCE  ALIGNMENT  OF  3D7  AND  DIO pfmdrl  5’UTR. 

80.5%  identity  in  503  nt  overlap;  score:  1217 

10  20  30  40  50  60 

TAAAATATACATAATTAAATATAAAAATGACATTATATTTTTGTTAAATTATACAGAAGA 


TAAATCTTTTATAA  -  — AAATATAATAATTAATAATTTTTTTTAATAAATAATTTTGTTTA 
10  20  30  40  50  60 

70  80  90  100  110  120 

AAAAAAAAAAAAAAAAAAATAGAAGTAAATTGTATAGAATTATTTNTTTATTAATATTAT 


AATTAATAATATGTAATTTTATTATTTATTTATTAACATTTTTTTTTATATTT - TATATA 
70  80  90  100  110  120 

130  140  150  160  170  180 

T  ATTTTATTTTGAATAA  -  — AACTATTTTNGTATCTAATAATAAATA  -  TAATAACAC  ATAT 


AAATACATATATAATAACTAAATTTATGCGCATATAAAAATATCTAATAATTTTAATTAT 
130  140  150  160  170  180 

190  200  210  220  230  240 

ATATATATATATAT  -  ATAT  ATAT  ATAT  ATTATT - TNANTTATTATTATATTTTTTTTT 


ATATAT  -  TATATATTATACATATAATTATTATAACGTTATATATTATTATATTATATTAT 
190  200  210  220  230 

250  260  270  280  290  300 

TTATTATTTTTTTTGTCATTGTGTAAATATATAAATATATATANTATATATATATTATTA 


T - ATTATTTTTTTTGTC ATTGTGTAAATATATAAATATATATATTATATATATATTATTA 
240  250  260  270  280  290 

310  320  330  340  350 

TTTCAACATTGTTT ATAT ATAT ATATATATATAT ATATAT- - - - TTATATTTATATATTG 


TTTCAACATTGTTTATATATATATATATATATATATATATATATTTATATTTATATATTG 


300 

310 

320 

330 

340 

350 

360 

370 

380 

390 

400 

410 

ATATATGTGTACATAGCTTATTTCATTTATAAGATTTAGATTTTGTTTTTAATATTATAT 


ATATATGTGTACATAGCTTATTTCATTTATAAGATTTAGATTTTGTTTTTAATATTATAT 


360 

370 

380 

390 

400 

410 

420 

430 

440 

450 

460 

470 

AATTTTGTTTGTTACAAATTAATTAATTATTTCTTTATTTCTTTATTTTATTTACATTTT 


AATTTTGTTTGTTACAAATTAATTAATTATTTCTTTATTTCTTTATTTTATTTACATTTT 
420  430  440  450  460  470 
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Sequence  alignment  of  3D7  (TOP)  and  T9/96  (BOTTOM)  Calmodulin  5'UTR. 

76.2%  identity  in  1015  nt  overlap;  score:  2163 

60  70  80  90  100  110 

AAAT  ATT  ATTTAT  AACAAG  AG  AAAAGGC  AG  AAACAAAATAA  -  ATTAT  AATAAAAAAC  AC  A 

AACCATTTTGTAAAAAAAATTAAAATATATTTATATAATATTATTTTATTTTATTATATA 
80  90  100  110  120  130 

120  130  140  150  160  170 

TTTTTTTATATTTGTATGAATATATTTTTTGTTATGCCTAAAAAAAAATAGGATTATC - A 


TTATATTATTTTTATTTTTATTTTTATTTTTTTTTCTCTACAAATT - TTATCTA 

140  150  160  170  180 


180  190  200  210  220  230 

TATTTTTATATAAAATGTAAGGATTTCAAAATATATATAATTT - TTTAAAATAACAAA 


TTGGTTTATTATAAAAATATCTATTTCTAATAATAAATAATTAAGATATCAATTTATAGA 


190 


200 


210 


220 


230 


240 


240 


250 


260 


270 


280 


250 


260 


270 


280 


290 


290 


AAGGGAACATTTTTTTTTTTTTTTTTAACATTTTCATGCCACGTTGACAAGAATTTTTAA 

AACAAAATATATACTTGTATAATTTTATTTTTTTATATAAATCATTACATATATAATTAT 


300 


300 


310 


320 


330 


340 


350 


AAAATCCATTAAATTAAAAATAACTTTTTTATTTATTTAAATAAGATATTCAAATAAGGA 

ACAATATTTTTTCTAAGAGATAA - TTATATATT - AATATATATAAAAAAAGG 

310  320  330  340  350 


360  370  380  390  400  410 

TATTTATTAATTAGCTCGCAAATGGCCAAATAAGAAATATAATATAATATATTATTATAT 


TGTTTTTTTTTTTTTTTTTTATTTTT - ATTTTTATTTTATGGTAAT ATTTTATTTTCC 


360 


370 


380 


390 


400 


410 


420 


430 


440 


450 


460 


470 


ATATTAT  AT  ATAT  AT  AT  AT  AT  AT  AAAT  AT  ATTTAT  AAT  AAT  A  -  AT  AT  AAAT  AAAGT  AT  AT 

TTATTTTATAAAT - TATATTAGTTTATATGTGATTAATTTTATATATTATCAATTTATAT 


420 


430 


440 


450 


460 


470 
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480  490  500  510  520  530 

GAAAAT ACAAAATGTTA  -  TTATGTAT  AT  ATAATATATAAT  ATATAATATTAT  ATATGTAA 

- - ATTTTTAAATGCTTACTTAATTATCTTTTTTTTTTTTTTTTTTTTTTTTTCCCCTCTT 
480  490  500  510  520 

540  550  560  570  580  590 

TAAATCAAAAAGAATATATAAATATTATATATATATATATATATATAATATATATATATA 

TTTATATTAATTTATTTTTGAAAAA-ATTGATATATATATATATATATAATATATATATA 
530  540  550  560  570  580 


600  610  620  630  640  650 

TACATGTAGTAGTATTAAACAATGTATAATATATATAAATAATATATTTATATATTTCAT 


TACATGTAGTAGTATTAAACAATGTATAATATATATAAATAATATATTTATATATTTCAT 
590  600  610  620  630  640 

660  670  680  690  700  710 

TTCAATTTTAATTTTTTTTG - TTTTTTTTTTTTTCTTTTTGTCATATTTAAAAAAAATT 


TTCAATTTTAATTTTTTTTGGTTTTTTTTTTTTTTCTTTTTGTCATATTTAAAAAAAATT 
650  660  670  680  690  700 

720  730  740  750  760  770 

ATATTCATATAAGTTATGCATTTTTTATAAACATTATTCAATATATGTATAATATAATAT 


ATATTCATATAAGTTATGCATTTTTTATAAACATTATTCAATATATGTATAATATAATAT 
710  720  730  740  750  760 

780  790  800  810  820  830 

ATATATATATATTAATGTATTATTCCAATGTGCATGATAAAAGAAAAAAATAATATTTAT 


ATATATATATATTAATGTATTATTCCAATGTGCATGATAAAAGAAAAAAATAATATTTAT 
770  780  790  800  810  820 

840  850  860  870  880  890 

AAAAAAAAAGAAAAATAAAACAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAATACAAA 


AAAAAAAAAGAAAAATAAAACAAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAATACAAA 
830  840  850  860  870  880 

900  910  920  930  940  950 

AATAAATAATATAATTTATAATTATATATTCTTGTCACAATAAAAATATATATATATATA 


AATAAATAATATAATTTATAATTATATATTCTTGTCACAATAAAAATATATATATATATA 
890  900  910  920  930  940 
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Functional  analysis  of  cis-elements  in  Plasmodium:  Basal  transcriptional  element 
identified 

The  goal  of  this  work  is  to  identify  and  characterize  cis-elements  that  regulate 
gene  expression  in  the  malaria  parasite.  We  have  used  the  model  system  developed  under 
this  funding,  namely  transfection  of  Plasmodium  gallinaceum  zygotes,  for  these 
experiments.  It  has  the  advantage  of  being  a  robust  system  which  readily  allows 
functional  testing  of  cis-elements  including  analysis  of  mutated  sequences  and  subsequent 
biochemical  characterization.  The  first  manuscript  for  this  work  has  been  accepted  for 
publication  and  a  second  is  in  the  final  stages  of  preparation.  The  work  is  summarized 
below.  The  technology  described  here  will  be  applied  to  analysis  of  P.  falciparum 
putative  promoter  elements  (see  above). 

The  malaria  parasite  undergoes  a  complex  developmental  process  through  its  life 
cycle.  This  includes  an  asexual  intraerythrocytic  cycle  in  the  vertebrate  host,  and  a  sexual 
cycle  that  commences  with  gametogenesis  in  the  vertebrate  host  and  subsequent 
fertilization  and  maturation  in  the  mosquito  vector.  Regulation  at  the  transcriptional  and 
post-transcriptional  levels  is  no  doubt  important  for  the  temporal  expression  of  genes 
required  at  each  stage  of  development.  Present  understanding  of  the  cis-elements 
important  for  transcriptional  control  in  Plasmodium  is  severely  restricted.  Sequence 
analysis  of  5’  flanking  regions  of  Plasmodium  genes  reveal  the  presence  of  sequences 
with  homology  to  known  eukaryotic  control  elements,  for  example  see  [1,  2];  however, 
the  functional  significance  of  these  sequences  in  Plasmodium  has  not  been  demonstrated. 
The  intergenic  region  in  Plasmodium  spp.  Is  particularly  AT-rich,  even  within  the  context 
of  the  AT-biased  (-80%)  genome  [3],  such  that  even  the  identification  of  TATA- like 
elements,  and  assays  to  determine  their  utility  and  importance,  becomes  difficult.  A 
growing  but  limited  number  of  functional  analyses  of  promoter  regions  of  Plasmodium 
genes  have  been  published,  many  of  which  shed  light  on  regions  that  are  necessary  for 
efficient  expression  [2,  4],  However,  only  a  few  studies  to  date  have  identified  specific 
sequences,  short  of  transcriptional  start  sites,  that  appear  to  be  important  for  gene 
expression  [4-6].  Due  to  the  small  numbers,  and  the  fact  that  these  genes  are  expressed  at 
different  stages  in  the  parasite  life  cycle,  no  consensus  or  common  sequences  could  be 
established.  Clearly,  much  more  can  be  learned  about  aspects  of  basal  transcription  as 
well  as  stage-specific  control  of  gene  expression  in  the  malaria  parasite. 

Pgs28  is  expressed  abundantly  on  the  surface  of  mosquito  stages  of  the  avian 
parasite,  Plasmodium  gallinaceum.  Pgs28  belongs  to  the  family  of  Pxs  proteins,  which 
includes  the  P.  berghei  homolog  Pbs21  and  the  P.  falciparum  homolog  Pfs25.  These 
proteins  contain  a  series  of  EGF-like  domains  that  may  serve  a  function  in  cell  Dignaling 
or  in  adhesion  [7,  8].  Pgs28,  Pfs25  and  Pbs21  had  been  identified  as  targets  for 
transmission  blocking  antibodies  [9-12].  Transcripts  of  pbs21  had  been  observed  in 
female  gametocytes  and  gametes,  as  well  as  zygotes  and  ookinetes,  and  the  pfs25 
promoter  appears  to  be  specifically  active  in  mosquito  stage  parasites,  supporting  the 
notion  that  the  genes  encoding  this  protein  family  are  activated  specifically  during  the 
sexual  stages  [5,  13,  14].  Since  Pbs21  is  initially  expressed  on  the  surface  of  zygote  stage 


17 


parasites,  additional  post-transcriptional  control  is  exerted  by  the  parasite  to  regulate 
Pbs21  expression.  We  are  interested  in  investigating  pgs28  gene  expression  to  further 
understand  transcriptional  regulation  in  Plasmodium  spp.  And  as  a  step  towards 
understanding  the  control  of  sexual  development  in  P.  gallinaceum.  In  this  report,  we 
present  a  functional  analysis  of  the  5’  flanking  region  of  pgs28,  using  firefly  luciferase  as 
a  reporter,  by  which  we  identified  two  regions  that  are  required  for  pgs28  trans-gene 
expression.  Furthermore,  using  Northern  analysis,  we  define  the  5’  limit  of  the  pgs28 
transcript  and  demonstrate  that  pgs28  transcripts  are  present  during  the  zygote  stage. 

The  5’  and  3’  flanking  sequence  of  pgs28,  together  with  an  in- frame  insertion  of 
the  luciferase  reporter,  were  previously  cloned  into  pBS  (pgs28.1LUC)  [15].  In  this 
study,  the  pgs28-luc  chimera,  containing  pgs28  5’  flanking  sequence,  the  pgs28-luc 
fusion  gene,  and  about  720  bp  of  3’  flanking  sequence,  from pgs28.1LUC  was  cloned  into 
the  Hindi II  site  of  pBS  to  create  BSpgs28-LUC.  The  1871  bp  5’  flanking  sequence  of 
pgs28  has  been  determined  and  deposited  in  GenBank.  (The  sequence  and 
characterization  of  the  3’  region  was  recently  published  [16].)  Expression  from  BSpgs28- 
LUC  was  confirmed  by  immunofluorescent  antibody  staining  and  immuno-electron 
microscopy  [17]  and  also  by  luciferase  assays  performed  24  or  48  hrs  post-transfection 
(see  below).  High  expression  levels  up  to  the  order  of  106  light  units  were  obtained, 
offering  a  sensitive  system  for  determining  changes  in  expression  levels. 

In  order  to  determine  the  sequence  requirements  for  pgs28  expression,  a  series  of 
5’  deletion  mutants  was  created  either  by  exonuclease  digestion  of  linearized  BSpgs28- 
LUC  plasmid,  or  by  PCR  mutagenesis.  Deletions  of  790bp  (FP1081),  1 13 lbp  (FP740), 
1358bp  (FP513),  1407  (FP464),  1485bp  (FP386),  1538  bp  (FP333),  1584bp  (FP287), 
1631  bp  (FP240)  and  1905bp  (FP+34)  from  BSpgs28-LUC  were  obtained  (Fig.  1A). 
Additionally,  an  internal  deletion  mutant  D376-316,  which  lacks  the  specified  sequences, 
was  created  by  inverse  PCR.  To  assess  the  contribution  of  the  deleted  sequences  to 
pgs28-luc  transgene  expression,  these  plasmids  were  transfected  into  sexual  stage 
parasites  as  previously  described  [15].  Luciferase  activity  was  assayed  after  24  or  48 
hours.  To  control  for  transfection  efficiency,  a  second  plasmid,  pgs28-GUS,  was  co¬ 
transfected  and  luciferase  light  units  normalized  to  GUS  fluorescence  units. 

Transfection  using  FP1081  demonstrated  that  expression  of  the  pgs28-lucif erase 
fusion  gene  did  not  decrease  significantly  when  the  5’  most  790  bp  were  deleted  from  the 
parent  plasmid  (Fig.  IB).  However,  luciferase  expression  from  FP740,  where  an 
additional  340  bp  has  been  removed,  was  reduced  by  more  than  40%.  A  modest  decrease 
in  promoter  efficiency  was  observed  with  the  removal  of  the  next  227  bp  (FP513). 
Further  deletions  of  up  to  180  bp  (FP464,  FP386,  FP333)  did  not  seem  to  significantly 
affect  expression  when  compared  to  FP513.  Interestingly,  FP287,  containing  a  deletion 
of  46  bp  3’  of  FP333,  had  less  than  5%  activity  compared  to  the  full-length  construct. 
Furthermore,  the  internal  deletion  mutant  □  376-3 16  was  also  severely  affected,  having 
only  6.6%  activity.  As  expected,  a  deletion  that  encompasses  part  of  the  pgs28  open 
reading  frame  (FP+34)  abolished  luciferase  expression.  The  mutant  FP240  had  equivalent 
activity  to  this  construct,  suggesting  that  important  elements  necessary  for  transcription 
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and  possibly  translation  have  been  removed.  Taken  together,  results  of  the  5’  deletion 
analysis  suggest  that  the  minimal  sequence  necessary  for  pgs28  transgene  expression 
consists  of  the  333  bp  upstream  of  the  translational  start  site.  Moreover,  a  17  base-pair 
sequence,  TACCATTTGTACAGACAG,  between  -333  and  -316,  appears  to  be  crucial, 
since  pgs28  expression  was  essentially  abrogated  in  a  5’  deletion  mutant  and  an  internal 
deletion  mutant  that  lack  these  sequences.  We  suggest  that  the  proximal  site  corresponds 
to  the  basal  promoter  or  initiator  element,  as  indicated  in  the  following  section.  We  also 
suggest  that  positive  regulatory  elements  lie  between  -1081  and  -740,  and  perhaps  within 
-740  and  -513.  This  distal  region  likely  contains  an  enhancer  element(s)  that  contributes 
to  pgs28  promoter  efficiency.  Thus  transcriptional  elements  that  control  pgs28  appear  to 
be  bipartite,  as  in  eukaryotic  promoters  and  other  Plasmodium  genes  that  have  been 
analyzed. 

We  used  Northern  analysis  as  a  preliminary  step  to  map  the  transcriptional  start 
site  of  pgs28,  and  to  determine  whether  the  temporal  pattern  of  pgs28  transcription 
paralleled  that  of  its  murine  homolog  pbs21.  RNA  was  isolated  from  newly  formed 
zygotes  collected  after  exflagellation,  and  from  gametes.  As  seen  in  Fig.2B,  an  intense 
signal  appeared  at  a  position  corresponding  to  a  message  of  about  1.4  kb  in  both  zygote 
and  gamete  when  probed  with  BBm600  (lanes  1  and  2),  which  extends  from  -381  in  the 
5’  flanking  region  to  within  the  pgs28  coding  sequence.  A  pgs28  message  of  1.5  kb  has 
previously  been  reported  by  Duffy  and  colleagues  [11].  Thus,  while  Pgs28  expression  is 
most  abundant  on  ookinete  surfaces,  pgs28  transcript  can  be  seen  as  early  as  the  zygote 
stage.  This  is  in  agreement  with  transfection  studies  in  our  laboratory  using  the  BSpgs28 
construct,  as  well  as  a  pgs28-GFP  fusion,  that  demonstrated  Pgs28  expression  on  the 
surface  of  zygotes  [17]. 

Recently,  the  polyadenylation  signal  of  pgs28  was  mapped  to  approximately 
425bp  downstream  of  the  stop  codon,  with  an  estimated  poly(dA)  tail  of  at  least  20 
nucleotides  [16],  Given  that  the  coding  sequence  of  pgs28  is  666bp,  the  transcription 
initiation  site  of  pgs28  would  lie  approximately  between  -390  and  -290bp.  In  agreement 
with  this  estimation,  only  the  probe  BV142,  encompassing  the  sequence  from  -381  to  - 
240,  hybridized  to  the  pgs28  transcript  (Fig.  2B,  lane  7),  while  probes  corresponding  to 
sequences  further  upstream  failed  to  hybridize  to  pgs28  mRNA  (lanes  3-6).  Thus,  these 
studies  establish  the  5’  limit  of  the  pgs28  transcript  at  -381  bp  upstream  from  the 
translational  start  site.  5’  deletion  analysis  suggests  that  the  transcriptional  start  site  is 
likely  to  be  downstream  of  -333bp.  Experiments  to  determine  the  precise  5’  end  of  the 
pgs28  transcript  will  resolve  this  aspect  of  pgs28  transcription. 

The  5’  flanking  sequence  of  pgs28  had  been  inspected  for  homology  to  other 
eukaryotic  transcriptional  regulatory  elements.  The  highly  AT-rich  region  between  -1 08 1 
and  -520,  typical  of  intergenic  regions  of  Plasmodium  spp.,  does  not  contain  sequences 
that  are  analogous  to  known  eukaryotic  regulatory  elements.  Two  GTAAT  sequences, 
demonstrated  to  be  important  for  GBP130  expression  [6],  can  be  found  in  this  region. 
Whether  an  element  associated  with  an  enhancer  of  an  asexual  stage  gene  in  P. 
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falciparum  is  important  for  expression  of  pgs28,  a  sexual  stage  specific  gene,  can  only  be 
determined  by  experimental  means. 

Sequences  downstream  of  -520  have  also  been  examined.  Within  this  region  are 
two  putative  TATA  elements  TAAAAAGAATAA  and  TATAAATGTTT,  centered  at  - 
434bp  and  -360bp  respectively  from  the  start  codon.  Since  these  sequences  can  be 
deleted  from  the  reporter  constructs  (FP386  and  FP333)  without  drastically  affecting 
expression,  they  are  not  likely  to  be  important  for  pgs28  expression.  This  again  illustrates 
that  sequence  analogy  to  eukaryotic  promoter  elements  does  not  necessarily  imply 
functional  significance  in  Plasmodium  genes.  Inspection  of  the  presumed  5’  UTR  reveals 
a  T-rich  stretch,  constituting  up  to  74%  of  the  bases  between  130bp  and  242bp.  A  series 
of  five  8-base  pair  inverse  repeat  elements  (TTTA  7T7TATTT)  could  be  identified  within 
this  sequence.  Further  examination  of  this  region  uncovers  3  direct  repeats  of  27bp  to 
29bp  in  length.  Whether  these  sequences  have  functions  at  a  post-transcriptional  step  to 
enhance  pgs28  expression  awaits  further  experimentation.  Recently,  transfection  studies 
of  pfs25  promoter  constructs  into  P.  gallinaceum  ookinetes,  and  mobility  shift  assays 
using  P.  gallinaceum  ookinete  nuclear  extracts,  suggest  that  the  sequence  AAGGAATA, 
found  at  —403  to  -396  and  —483  to  -476  from  the  initiation  codon  in  pfs25,  interacts  with 
a  nuclear  factor  and  is  important  for  expression  of  pfs25  [5].  A  similar  sequence, 
AAGAATAA,  is  found  at  -354  and  -347  in  pgs28,  within  the  putative  proximal  TATA 
sequence.  Again,  the  transfection  studies  reported  here  suggest  that  this  sequence  in 
pgs28  can  be  deleted  without  severely  affecting  pgs28  transgene  expression.  This 
suggests  that  the  nuclear  factor  PAF-1  [5]  is  not  involved  in  pgs28  transcription,  and/or 
that  it  has  a  stringent  sequence  requirement  that  the  AAGAATTT  sequence  in  pgs28  does 
not  satisfy.  Even  though  pgs28  and  pfs25  belong  to  the  same  family,  and  possess  similar 
expression  profiles  during  the  parasite  life  cycle,  they  may  not  necessarily  be  controlled 
by  the  same  evolutionarily  conserved  factors.  Nonetheless,  given  the  close  evolutionary 
relationship  between  P.  gallinaceum  and  P.  falciparum,  it  would  be  of  great  interest  to 
determine  whether  the  17  bp  upstream  sequence  in  pgs28  between  -333  and  -316  would 
be  able  to  functionally  replace  the  pfs25  sequence,  and  vice  versa. 
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Fig.  1A.  Schematic  of  the  pgs28  5'  flanking  sequence. 

The  5*  flanking  sequence  of  pgs28  cloned  into  BSpgs28-LUC  is  shown,  together  with  part  of  the  pgs28 
and  luc  coding  sequence.  To  obtain  BSpgs28~LUC,  pgs28ALUC  [15]  was  digested  with  Hindlll  and 
cloned  into  similarly  digested  pBluescript  KS+  (Strategene). 

The  bars  at  approximately  -440  and  -360  represent  two  putative  TATA  boxes.  The  hatched  box 
downstream  of -240  represent  the  T-rich  sequence  with  internal  repeats. 

Positions  of  the  5'  deletion  mutants  are  indicated.  The  numbers  refer  to  the  distance  in  nucleotides  away 
from  the  start  of  the  coding  region  (+1). 

To  generate  the  5'  deletion  mutants  FP1081,  FP464,  FP513,  FP386  and  FP+34,  BSpgs28-LUC 
was  first  digested  with  SacI  and  Spel  (New  England  Biolabs).  The  linearized  plasmid  was  digested 
further  with  exonuclease  III/mung  bean  nuclease  essentially  as  described  by  the  manufacturer 
(Strategene).  E.  coli  (XL-1  Blue)  cells  were  transformed  with  ligated  products  and  the  sizes  of  the 
plasmids  obtained  were  determined  by  agarose  gel  electrophoresis.  FP464  was  generated  by  recloning 
the  filled-in  Nde I  insert  from  BSpgs28-LUC  into  Smal  digested  pBS.  FP333,  FP287  and  FP240  were 
created  by  PCR  mutagenesis,  using  FP464  as  template,  and  the  upstream  primers 
S'GAATTCCTGCAGCCCTACCATTTGTACAGAC, 

5  ’G AATTCCTGCAGCCC ACTAGCTAAAAG AAATATG,  and 

5'GAATTCCTGCAGCCCATTTTTATTTAATTTTTC  respectively.  The  Pstl  site  is  underlined.  The 
downstream  primer,  5'CTAGAGGATAGAATGGCGCCG,  containing  an  internal  Sfol  site  (underlined), 
was  used  in  all  cases  and  was  derived  from  the  luc  coding  region.  Purified  PCR  fragments  were 
digested  with  Pstl  and  Sfol  and  cloned  into  similarly  cut  FP464  vector  backbone. 

To  generate  ?  376-316,  primers  WFM48  5’CCATTTGTTATTGTATATAAAAAAAAAAAC 
and  WFM20R  5’GATCTTCTTAATCTTTGTAAAAATAACTG,  which  flank  the  sequences  to  be 
deleted,  were  used  to  amplify  FP513  that  had  previously  been  linearized  with  Bglll,  utilizing  the 
TaqPlus  Long  PCR  system  (Strategene).  30  cycles  of  PCR  reactions  were  performed  in  low  salt  buffer 
under  the  following  conditions:  94°C  for  1  min,  55°C  for  1  min  and  72°C  for  7  mins.  PCR  products 
were  treated  with  Dpnl  at  for  30  mins  and  further  treated  with  1?  1  of  Pfu  for  10  cycles  and  incubation  at 
37°C  for  30  mins.  Amplified  products  were  phenokchloroform  extracted  and  ethanol  precipitated,  and 
resuspended.  Amplified  DNA  containing  the  deletion  was  allowed  to  circularize  and  transformed  into  E. 
coli  (XL-1  Blue)  cells.  Sequences  of  all  clones  were  confirmed  by  DNA  sequencing. 

B.  Luciferase  expression  from  pgs28  5’  deletion  mutants. 

Parasites  were  transfected  with  the  indicated  plasmids  and  pgs28-GUS ,  and  luciferase  and  GUS  activity 
assessed  24  or  48  hrs  post-transfection,  as  described.  The  construction  of pgs28-GUS,  containing  an  in¬ 
frame  insertion  of  the  ?  -glucuronidase  gene  (Clontech)  within  pgs28y  has  been  described  elsewhere. 
Normalized  relative  light  units  and  SD  are  shown.  The  indicated  activity  is  the  average  of  3-8 
determinations. 
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Fig.  2A  Positions  of  DNA  probes  used  to  map  the  5f  end  of pgs28  transcripts. 

Probes  MS135  (-786  to  -651),  SN188  (-651  to  -463),  NB82  (-463  to -381),  and  BV142(-381  to -240)  and 
BBm600  (-381  to  +217)  were  made  by  digesting  an  Xbal  fragment  of  BSpgs28-LUC  containing  pgs28 
sequences  with  Mwol/Swal,  Swal/Ndel  Ndel/BgHI,  Bglll/VspI  and  BgUI/BamHI  restriction  enzyme 
pairs,  respectively.  These  resulted  in  fragments  of  lengths  indicated  by  the  numerals  in  the  designations. 

B.  Determination  of  size  and  the  5’  end  of pgs28  mRNA  from  P.  gallinaceum 

RNA  was  extracted  either  from  zygotes  (lanes  1  and  3)  or  ookinetes  (lanes  2,  4-7),  fractionated  and 
Northern  blotted  using  standard  procedures.  Between  four  and  five  micrograms  total  RNA  obtained  from 
3xl07  parasites  were  included  per  lane.  Blots  were  probed  with  the  indicated  DNA  fragments,  washed  and 
autoradiographed  for  24-48  hours. 

Lanes  1  and  2,  BBm600;  lanes  3  and  4,  MS135;  lane  5,  SN188;  lane  6,  NB82;  lane  7,  BV142. 


5.3  Functional  analysis  of  putative  drug  resistance  and  new  drug  target  genes  in 

the  heterologous  yeast  expression  system  (for  Figures,  see  Nau  et  al.,  2000  in 

Appendix) 

1.  Identification  of  new  drug  target  genes  through  complementation  analysis  in 
yeast.  A  single  yeast  strain  with  mutations  in  PDR5/10/SNQ2  has  been 
chosen  for  this  work. 

2.  Development  of  new,  rapid,  high  throughput  drug  screening  methods  for 
malaria  genes  expressed  in  yeast 

This  work  continues  as  previously  and  also  includes  new  initiatives  that  make  use 
of  technology  not  previously  available.  The  use  of  the  Yeast  DNA  Microarray  in 
collaboration  with  the  groups  of  Dr.  Maryanne  Vahey,  Dr.  Dennis  Kyle  and  Dr.  Keith 
Martin,  WRAIR  has  greatly  facilitated  our  work  and  identified  new  approaches  to  drug 
development  using  this  heterologous  system.  The  initial  results  of  this  work  have  been 
submitted  for  publication  and  the  ongoing  work  is  summarized  below. 

(i)  Microarray  analysis 

The  majority  of  the  work  towards  my  thesis  has  involved  the  analysis  of  global 
expression  patterns  of  Saccharomyces  cerevisiae  when  exposed  to  the  antimalarial 
compound  chloroquine.  Our  laboratory  is  interested  in  the  mechanisms  of  drug  resistance 
employed  by  protozoan  parasites  with  specific  interest  in  the  role  of  membrane  transports 
in  resistance.  Chloroquine  is  an  important  compound  for  malarial  treatment  and  an 
understanding  of  the  ways  in  which  organisms  respond  and  develop  resistance  to  this 
compound  is  of  interest.  The  choice  of  yeast  as  the  model  system  for  this  analysis  is 
based  on  several  points: 

•  The  global  expression  response  to  antimalarial  compounds  has  not  been 
investigated  in  any  organism 

•  The  yeast  system  provides  a  unique  combination  of  tools  in  the  form  of  a 
complete  genomic  sequence  with  approximately  70%  annotation  of  function 
and  microarray  technology  that  allows  the  simultaneous  observation  of  all  of 
the  Open  Reading  Frames  (ORFs)  of  the  yeast  genome 

•  Presence  of  a  network  of  ATP-binding  Cassette  (ABC)  transporters 
(pleiotropic  drug  resistance)  with  similarity  to  ABC  transporters  found  in 
other  systems  that  are  involved  in  Multi-drug  resistance  phenotypes  (see  figure 
1  for  list  of  yeast  ABC  transporters) 

•  The  availability  of  yeast  strains  produced  by  functional  knock-out  that  are 
sensitive  to  compounds  of  interest 
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This  type  of  analysis  will  not  only  provide  a  better  understanding  of  how  the 
Pleiotropic  Drug  Resistance  (PDR)  network  of  yeast,  and  the  corresponding  ABC- 
transporters  in  the  parasite  system,  functions  but  will  also  provide  leads  to  additional 
mechanisms  and  homologues  in  the  parasite  system. 

B.  Materials  and  Experimental  Design 

This  study  is  utilizing  the  Affymetrix  Gene  Chip?  Ye6100  yeast  chip  system. 
This  system  is  composed  of  5  chips,  1  test  chip  and  4  yeast  ORF  chips.  The  test  chip  is 
used  for  quality  control  and  contains  representative  ORFs  from  several  organisms  and  a 
set  of  spike  controls  that  are  also  present  on  the  4  main  yeast  ORF  chips. 

The  4  main  chips  contain  probes  for  approximately  6200  yeast  ORF  that  cover 
virtually  the  entire  yeast  genome.  Each  probe  set  consists  of  ~20  sets  of  25mer 
oligonucleotides  that  are  exact  matches  to  the  genomic  sequence  and  corresponding  sets 
of  one  base  mismatches.  The  mismatch  sets  provide  controls  for  background  and  non¬ 
specific  hybridization.  The  resolution  range  for  this  assay  is  0.1-100  mRNA  molecules 
per  cell. 

The  yeast  strains  used  for  this  study  are  the  PDR  functional  knockout  YHW1052, 
which  has  functional  disruptions  in  3  ABC-transporters,  and  the  parental  wild-type  strain 
YPH499.  The  specific  genotypes  for  these  strains  are  given  in  table  I. 

Three  treatment  points  were  selected  based  on  the  growth  curves  depicted  in 
figure  6.  The  three  points  increase  in  severity  with  increase  in  number:  T1  (2hr- 
1.5mg/ml),  T2  (3hr-2.5mg/ml),  T3  (4.5hr-2.5mg/ml).  The  T1  treatment  was  selected  to 
examine  the  expression  profile  at  a  point  on  the  growth  curve  just  before  the  two  strains 
diverged  from  one  another  with  the  expectation  that  expression  levels  would  already  have 
significant  differences.  The  T3  point  was  selected  to  examine  the  profile  under  extreme 
drug  stress. 

C.  Results 

The  gross  global  expression  patterns  observed  over  the  three  treatment  points  for 
each  strain  are  depicted  in  figures  7  and  8.  These  graphs  show  all  ORFs  that  had  a 
differential  of  3-fold  in  expression  levels  comparing  drug  treated  to  control.  These  ORFs 
are  divided  into  12  functional  families  by  their  annotations  in  the  Saccharomyces  Genome 
and  MIPS  Genome  Databases.  It  is  interesting  to  note  that  although  roughly  70%  of  the 
yeast  database  is  annotated  with  either  similarity  or  direct  functional  data  the  category  of 
Unknown  Function  (UNK)  still  ranks  as  the  top  group  for  expression  response  to  the  drug 
treatment  as  compared  to  control.  This  has  also  been  observed  in  other  microarray 
studies  such  as  those  conducted  by  the  Jelinsky  and  Samson. 
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Comparing  the  two  profiles  there  is  a  significant  difference  in  expression  response 
between  the  two  strains.  The  peak  in  expression  response,  measured  by  number  of  ORFs 
having  a  3 -fold  differential,  for  the  wild-type  parental  strain  (YPH499)  is  in  T2  while 
that  of  the  functional  knockout  (YHW1052)  occurs  during  T3.  The  majority  of  the  this 
peak  response  for  the  YHW1052  strain  is  a  decrease  in  expression  as  compared  to  the 
control  and  the  cells  appear  very  unhealthy  on  visual  inspection.  It  is  possible  that  the 
expression  profile  for  the  functional  knockout  strain  in  T3  is  largely  the  result  of  cellular 
death  processes. 

Another  indication  of  the  differences  in  expression  profiles  between  the  two 
strains  is  the  small  number  of  high  response  ORFs  that  are  in  common  for  the  two  strains. 
Both  strains  have  more  than  80  ORFs  responding  with  a  6-fold  change  but  only  1 1  of 
these  are  shared.  Once  again  roughly  half  of  these  ORFs  are  of  unknown  function. 

Challenges  associated  with  this  type  of  analysis  are  data  handling  and  the 
selection  of  specific  targets  for  further  study.  These  include  the  ABC-transporters  of  the 
PDR  network  and  two  members  of  the  Major  Facilitator  Super  family  (MFS)  of  small 
molecule  transporters. 

The  PDR  transports  were  selected  based  on  the  interests  of  our  lab  as  a  whole  and 
the  relation  of  this  network  of  transporters  with  several  aspects  of  drug  resistance. 
Observations  made  by  our  laboratory  and  that  of  Karl  Kuckler’s,  the  source  of  our  pdr 
yeast  strains,  indicated  overlapping  function  and  substrate  specificity  for  members  of  the 
pdr  network.  The  current  study  provides  an  opportunity  to  investigate  the  expression 
patterns  that  associated  with  this  phenomenon  and  to  more  clearly  elucidate  the 
interactions  of  these  transporters. 

The  MFS  transporter  SIT1,  an  iron  siderophore  transporter,  has  been  selected 
based  on  the  magnitude  of  its  expression  in  the  two  strains  and  its  status  as  one  of  the 
shared  ORFs  in  the  expression  responses  of  the  strains  2  (Lesuisse  et  al  1998). 
YOR273C,  the  other  MFS  member,  was  selected  based  on  the  support  of  our  expression 
data  by  an  independent  functional  screen  performed  by  Delling  et  al.  in  which  a  yeast 
genomic  S.c.  library  was  screen  for  conference  of  resistance  to  quinoline  ring-containing 
antimalarial  compounds  (Delling,  et  al  1998) 

In  the  wild-type  strain  (YPH499)  PDR5  has  a  small  but  significant  increase 
compared  to  control  in  the  treated  sample.  The  PDR5  gene  is  the  member  of  the  PDR 
network  that  shows  the  greatest  similarity  to  the  PfMDRl  gene  of  Plasmodium 
falciparum.  In  the  case  of  the  functional  knockout  strain  (YHW1052)  there  are 
significant  increases  in  three  members  of  the  PDR  network  family,  PDR12,  PDR15,  and 
YOR1,  in  response  to  the  removal  of  PDR5,  PDR10,  and  SNQ2.  This  observation 
appears  to  further  support  the  hypothesis  that  these  transporters  have  overlapping 
responses  and  substrate  specificity. 

YOR273C  shows  significant  expression  in  both  strains  and  this  expression  is 
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supported  by  Northern  Slot  analysis. 

D.  Summary  and  Future  Directions 

•  Chloroquine  treatment  affects  the  expression  of  >200  ORFs  in  each  strain 

•  Gene  expression  profile  is  dependent  on  genetic  background;  specifically 
the  functional  knockouts  have  a  significant  impact 

•  Expression  of  PDR-related  transporters  supports  the  predicted  roles  and 
the  hypothesis  of  overlapping  response  and  substrate  specificity 

•  There  are  several  indications  for  a  role  of  YOR273C  in  responses  to 
quinoline  ring  compounds 

•  Northern  Slot  analysis  confirms  the  chip  data  on  YOR273C 

Northern  Slot  analysis  will  be  continued  to  confirm  chip  data  on  other  ORFs  of 
interest.  Further  analysis  of  the  array  data  will  be  conducted  using  cluster  and  temporal 
approaches  in  order  to  discover  further  associations  and  patterns.  Our  data  will  also  be 
compared  with  other  published  and  available  array  data  in  order  to  discern  general  stress 
responses  and  other  general  response  phenomenon.  Overexpression  and  Knockout 
experiments  are  planned  with  selected  targets.  Additional  knockout  strains  for  PDR 
transports  are  in  hand  and  analysis  of  these  is  underway. 

In  addition  we  are  interested  in  conducting  an  additional  chip  analysis  of  the 
original  strains  with  another  compound  (FK506,  fluconazole,  ketoconazole, 
rhodamine6G)  as  yet  to  be  determined. 

Confirmation  of  Complementation  and  Mating  Phenotype 

Recently  we  demonstrated  that  expression  of  PfMDRl  in  yeast  deficient  for  ste6, 
resulted  in  complementation  of  the  mating  phenotype  conferred  by  the  native  STE6 
protein  in  yeast  (Volkman,  et  al.,  PNAS  (95)  92,  8921].  Ruetz  et  al.  [PNAS  (96)  93,  9942] 
reported  complementation  of  ste6  with  PfMDRl,  and  that  expression  of  PfMDRl 
conferred  drug  resistance  in  yeast  for  quinine,  quinacrine,  mefloquine  and  halofantrine. 
The  observation  that  PfMDRl  expression  conferred  drug  resistance  in  yeast  is  different 
than  our  findings  that  PfMDRl  expression  is  associated  with  increased  drug  sensitivity  in 
yeast.  Recently  these  data  of  Ruetz  et  al.  have  been  retracted  [PNAS  (99)  96,  1810], 
citing  that  ste6  sequences  were  identified  in  yeast  transformants  believed  to  contain 
PfMDRl.  Because  of  this  report  we  wanted  to  confirm  our  original  findings,  that 
expression  of  PfMDRl  in  yeast  deficient  for  ste6  restores  a  mating  phenotype,  and  these 
data  are  reported  here.  The  goal  of  these  experiments  was  to  (1)  confirm  the  previously 
observed  mating  phenotype;  (2)  demonstrate  that  mating  is  due  to  the  presence  of 
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PfMDRl',  and  (3)  show  that  mating  is  not  due  to  the  presence  of  ste6.  Similar  experiments 
are  independently  being  conducted  in  other  laboratories  to  confirm  these  results. 

Two  independently  derived  plasmid  constructs  containing  the  PfMDRl  gene  were 
tested  (pY PfMDRl -l  and  pY  PfMDRl -2).  As  controls,  the  same  plasmid  containing  either 
no  insert  (pY)  or  the  ste6  gene  (pYste6)  was  used.  Yeast  strains  used  were  the  ste6- 
deficient  (ste6)  strain  SM1563  [a  trpl  leu2  ura3  hisA  can\  ste6:LEU2)  into  which 
plasmids  were  transformed,  and  the  MAT  strain  SM1068  [/ysl]  to  test  the  ability  of  the 
transformants  to  confer  a  mating  phenotype.  Three  independent  mating  assays  were 
performed  with  three  single  colonies  from  new  transformation  experiments  for  each  of 
the  two  PfMDRl  plasmids.  Mating  assays  were  performed  by  mixing  107  MATa  cells 
with  108  MAT  cells,  and  plating  the  mixture  on  SD  plates.  The  number  of  diploids 
formed  after  two  days  was  counted  with  the  data  from  three  mating  assay  shown  in  Table 
II.  These  data  confirm  that  yeast  containing  both  of  the  PfMDRl  plasmids  were  able  to 
complement  the  STE6  phenotype  and  restore  the  ability  of  yeast  deficient  for  ste6  to 
mate. 

It  was  observed  that  on  average,  only  approximately  one  out  of  every  ten  to  fifty 
transformants  resulted  in  successful  complementation  of  mating  phenotype  (data  not 
shown).  The  reason  for  this  is  not  known,  but  presumably  plasmid  loss  or  rearrangement 
results  in  the  loss  of  PfMDRl  expression  in  these  cells.  Freshly  transformed  cells  were 
more  likely  to  yield  transformants  that  complemented  ste6,  and  for  these  experiments, 
three  of  five  colonies  isolated  for  each  plasmid  (pYPfMDRl-l  or  pY  PfMDRl -2) 
successfully  mated.  Experiments  performed  in  our  laboratory  used  yeast  strains  with 
distinct  genetic  backgrounds,  and  different  plasmid  constructs  expressing  PfMDRl  that 
were  derived  independently  from  those  used  in  the  work  by  Ruetz  et  al.  When  plasmids 
containing  PfMDRl  sequences  received  from  Dr.  Phillipe  Gros  (pVT-PfMDR )  were 
tested  in  our  yeast  assay  system,  these  transformants  did  not  restore  a  mating  phenotype 
(Table  II). 

Table  II:  Summary  of  Mating  Phenotype  for  Yeast  Transformed  with  PfMDRl 


Yeast  Transformant 

_ i 

II 

III 

IV 

PY 

0 

0 

pY  ste6 

>2000* 

BSDHHH 

207 

79 

pYP/MDRl-1.2 

241 

139 

pYPfMDRl-1.3 

189 

IB 

pY  PfMDRl -2. 1 

95 

82 

196 

pYPfMDRl-2.2 

133 

114 

134 

pY  PfMDRl -2. 3 

119 

109 

177 

Yeast  Transformant 

IV 

V 

pY 

0 

0 

pY  ste6 

>2000* 

pY  PfMDRl 

79 

101 

pVT-PfMDR- 1 

0 

0 

0 

0 
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Yeast  transformed  with  PfMDRl  plasmid  conferred  a  mating  phenotype.  MATa  yeast  deficient  for  ste6 
(SMI  563)  were  transformed  with  pY  PfMDRl,  and  three  separate  transformants  for  each  of  the  two 
independently  derived  yeast  expression  plasmids  containing  PfMDRl  ( pYPfMDRI-l  and  pY PfMDRl- 
2)  were  analyzed.  In  three  separating  mating  assays,  a  total  of  1 07  MATa  cells  and  1 08  MAT  cells  were 
incubated  and  the  number  of  diploids  recovered  for  each  experiment  (I-III)are  reported.  Additional 
experiments  using  two  pVT-PfMDR  plasmids  (Ruetz  et  al)  were  performed  (IV-V).  The  number  of 
diploids  for  the  ste6  control  was  estimated  by  plating  a  dilution  of  the  mixture. 


To  test  if  the  observed  mating  phenotype  was  due  to  the  presence  of  the  PfMDRl 
gene,  and  to  demonstrate  that  the  native  ste6  gene  is  not  present  in  these  yeast 
transformants,  experiments  using  the  polymerase  chain  reaction  (PCR)  were  performed 
using  gene  specific  primers.  This  analysis  was  performed  both  on  total  DNA  derived 
from  yeast  transformants  that  had  successfully  mated,  as  well  as  on  plasmid  DNA 
recovered  from  bacteria  transformed  with  a  sample  of  this  total  DNA.  Primer  sequences 
for  ste6  were  derived  from  nucleotides  755-780  and  2069-2095  and  amplified  a  product 
of  approximately  1340  nucleotides,  while  primer  sequences  for  PfMDRl  were  derived 
from  nucleotides  510-534  and  1462-1487  and  amplified  a  product  of  approximately  980 
nucleotides.  These  data  demonstrate  that  DNA  derived  from  yeast  that  conferred  a 
mating  phenotype  contained  PfMDRl  sequences  for  yeast  transformed  with  either 
pY  PfMDRl -l  or  pY  PfMDRl -2,  and  did  not  contain  contaminating  ste6  sequences. 
Similarly,  plasmids  derived  from  bacteria  transformed  with  these  DNA  samples  contained 
the  expected  PfMDRl  sequences  from  yeast  transformed  with  either  pYPfMDRl-l  or 
pY PfMDRl -2,  and  did  not  contain  ste6  sequences.  These  data  demonstrate  that  yeast 
transformed  with  PfMDRl  that  conferred  a  mating  phenotype  contain  PfMDRl ,  but  not 
ste6.  Together  these  data  demonstrate  that  yeast  deficient  for  ste6  that  are  transformed 
with  PfMDRl  restore  a  mating  phenotype,  and  that  this  mating  phenotype  is  due  to  the 
presence  of  PfMDRl  and  not  contaminating  ste6  sequences. 


(6)  Key  Research  Accomplishments 

Development  of  the  Serial  Analysis  of  Gene  Expression  (SAGE)  system  for 
Plasmodium  falciparum 

Analysis  of  5’  UTR  of  Plasmodium  falciparum  genes 

Identification  and  Functional  Analysis  of  Plasmodium  cis-regulatory  elements  for 
gene  expression 

Development  of  Y east  Microarray 
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Wirth  DF.  Invited  Speaker.  XVII  Whitehead  Symposium,  Biology  of  Drug  Discovery, 
October  24-26,  1999,  Massachusetts  Institute  of  Technology,  Cambridge,  MA 

Wirth  DF.  Invited  Speaker.  Gordon  Research  Conference,  June  20-25,  1999,  Newport, 
R.I. 

Wirth  DF.  Invited  Speaker.  Molecular  Approaches  to  Malaria,  February  2-5, 2000. 
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Wirth  DF.  Invited  Keynote  Speaker.  World  Health  Week,  March  23, 2000.  Vanderbilt 
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(8)  CONCLUSIONS 

Malaria  represents  a  major  and  increasing  threat  to  the  U.S.  Military.  Many  of  the 
sites  of  current  or  potential  U.S.  Military  involvement  are  endemic  for  malaria  and  in 
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several  sites;  multidrug  resistant  P.  falciparum  represents  a  major  problem  especially  for 
non-immune  military  personnel.  Current  drugs  available  to  the  U.S.  Military  are  quickly 
losing  their  effectiveness  because  of  emerging  and  spreading  drug  resistance.  This  work 
is  directed  both  at  identifying  new  drugs  and  drug  targets,  but  equally  importantly  toward 
an  understanding  of  drug  resistance  mechanisms  with  the  goal  of  preventing  or 
overcoming  drug  resistance  in  the  malaria  parasite. 

A  new  strategy  for  drug  development  is  urgently  needed.  Current  drugs  are  based 
on  a  small  number  of  target  molecules  or  lead  compounds  and  in  most  cases  the  target  of 
drug  action  is  yet  to  be  identified.  Resistance  is  emerging  rapidly  and  the  mechanisms  of 
resistance  are  poorly  understood.  The  identification  of  new  targets  or  new  candidate 
drugs  based  on  an  understanding  of  the  parasite  biology  are  key  elements  in  this  new 
strategy.  Clearly  the  development  of  a  new  antimalarial  will  require  both  basic  and 
applied  research  working  in  concert  with  one  another. 

The  goal  of  this  work  is  to  use  a  molecular  genetic  approach  both  in  the 
identification  of  new  drug  targets  and  in  the  investigation  of  mechanisms  of  drug 
resistance.  Progress  has  been  made  in  several  key  areas.  During  this  year  we  have  tried 
new  technical  approaches  to  address  the  key  goals  of  this  work.  These  technical 
approaches  were  not  available  at  the  time  of  the  original  plan  and  are  based  on  the  rapidly 
evolving  genome  projects,  including  the  completion  of  the  yeast  genome  sequence  and 
the  development  of  the  Plasmodium  falciparum  genome  project.  We  have  used  these 
advances  both  in  developing  methods  for  understanding  gene  expression  in  response  to 
drug  treatment  and  in  the  future  hope  to  use  these  methods  to  identify  new  drug  targets. 
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Abstract 


The  Pgs28  protein  is  a  major  surface  antigen  of  the  sexual  stages  of  Plasmodium  gallinaceum  —  the  zygotes  and 
the  ookinetes.  The  protein  contains  conserved  motifs,  namely  an  N-terminal  signal  sequence,  four  epidermal  growth 
factor-like  repeats  and  a  C-terminal  hydrophobic  domain  that  serves  as  a  signal  for  glycosylphosphatidylinositol 
(GPI)  —  anchor  modification.  In  this  study,  we  define  the  protein  motifs  required  for  the  surface  localization  of 
Pgs28  in  ookinetes,  using  transient  transfection  combined  with  immunofluorescence  and  confocal  microscopy.  Pgs28 
fused  to  the  green  fluorescent  protein  (Pgs28-GFP)  is  expressed  in  zygotes,  intermediate  retort  forms  and  ookinetes. 
Mutational  analyses  of  Pgs28  coding  regions  reveal  that  deletions  of  the  signal  sequence  and  the  C-terminal  domain 
result  in  intracellular  retention  of  the  fusion  protein.  Therefore,  the  signal  sequence  and  C-terminal  domain  are 
required  for  cell  surface  localization.  Additionally,  the  Pgs28-GFP  fusion  proteins  are  shed  from  the  surface  of  live 
ookinetes,  suggesting  that  Pgs28  may  be  involved  in  interactions  with  the  cells  of  the  mosquito  midgut  or  during 
motility.  ©  2000  Elsevier  Science  B.V.  All  rights  reserved. 

Keywords:  Plasmodium  gallinaceum;  Sexual  stages;  Pgs28;  Signal  sequence;  GPI  —  anchor;  Membrane  shedding 


Abbreviations:  EGF,  epidermal  growth  factor;  ER,  endo¬ 
plasmic  reticulum;  FITC,  fluorescein  isothiocyanate;  GFP, 
green  fluorescent  protein;  GPI,  glycosylphosphatidylinositol; 
IFA,  immunofluorescence  assay;  PBS,  phosphate-buffered  sa¬ 
line;  Pgs28,  Plasmodium  gallinaceum  sexual  stage  protein  of  28 
kDa;  Pxs21/25,  family  of  proteins  expressed  in  the  sexual 
stages  of  all  Plasmodium  species,  molecular  weight  21-25  kDa. 
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1.  Introduction 

The  Pxs21/25  proteins  (21-25  kDa  proteins 
found  in  all  Plasmodium  species)  are  targets  for 
transmission-blocking  vaccines  [1],  These  proteins 
include  the  Pfs25  and  Pfs28  proteins  of  Plasmod¬ 
ium  falciparum  [1],  Pbs21  from  Plasmodium 
berghei  [2,3]  and  Pgs28  from  Plasmodium  galli¬ 
naceum  [4].  Numerous  studies  have  focused  on 
the  transmission-blocking  abilities  of  antibodies 
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raised  against  recombinant  Pfs25  and  Pfs28  ex¬ 
pressed  in  heterologous  systems  [5],  while  protein 
trafficking  of  Pbs21  has  been  elegantly  studied  in 
insect  cells  [6].  We  are  interested  in  defining  the 
protein  motifs  that  are  essential  for  targeting 
Pgs28  to  the  surface  of  cells  that  normally  express 
the  protein,  P.  gallinacenri  ookinetes.  These  cells 
provide  a  novel  biological  system  for  the  study  of 
many  proteins  currently  being  developed  as  vac¬ 
cine  candidates.  Additionally,  P.  gallinaceum  is  an 
ideal  system  for  the  analysis  of  Pgs28  localization 
as  sexual  stages  can  be  readily  isolated  and  trans¬ 
fected  with  expression  vectors.  Hence,  rather  than 
using  a  heterologous  system,  we  have  studied 
■  Pgs28  protein  trafficking  in  the  sexual  stages  of  P. 
gallinaceum. 

The  Pxs21/25  proteins  provide  a  useful  tool  to 
study  trafficking  in  ookinetes.  These  proteins  con¬ 
tain  distinct  motifs,  including  a  21  amino  acid 
N-terminal  signal  sequence  that  is  cleaved  from 
the  mature  protein,  four  to  six  EGF-like  repeats 
and  a  C-terminal  hydrophobic  region  that  pro¬ 
vides  the  signal  for  anchoring  to  the  membrane 
via  a  glycosylphosphatidylinositol  (GPI)  modifica¬ 
tion  [4].  The  C-terminal  domain  is  cleaved  prior 
to  addition  of  the  GPI  moiety.  In  eukaryotic 
systems,  signal  sequences  have  been  shown  to  be 
essential  for  targeting  proteins  to  the  endoplasmic 
reticulum  (ER)  [7]  while  EGF  repeats,  first  de- 
^  scribed  in  epidermal  ^owth  factor,  are  important^ 
3or  protein— protein  interactions  in  cell  adhesion  ;,, ... 
and  signaling  during  neurological  development 
and  coagulation  [8].  The  importance  of  the  signal 
^sequence  and  C-terminal  region  of  Pbs21  has  been  ' 
demonstrated  in  insect  cells  [6].  Deletion  of  the 
signal  sequence  prevented  transport  of  Pbs21  to 
the  ER  of  the  insect  cells  while  deletion  of  the 
GPI-anchor  disrupted  Pbs21  translocation 
through  the  ER  and  distribution  on  the  cell  sur- 
ggface.  Moreover,  deletion  of  the  GPI-anchor  re-^, 
suited  in  the  secretion  of  recombinant  protein  into 
the  culture  medium. 

While  studying  protein  localization  to  parasite 
K  embranes,  it  is. important  to  appreciate  that  the  ,  . ; 
cell  surface  is  a  dynamic  structure.  Shahabuddin 
^|t  al.  have  shown  that- the  ookinete  surfaceiis.  . 

’  efficiently  labeled  with  a  lipophilic  dye,  PKH26;  I 
this  dye  is  shed  from  the  motile  parasite  as  evi¬ 


denced  by  trails  behind  the  ookinete  suggesting 
that  the  ookinete  surface  membrane  is  sloughed 
off  during  movement  [9],  Although  highly  likely, 
it  is  unclear  whether  membrane-bound  proteins 
like  Pgs28  are  also  shed  from  the  surface  of  the  ' 
ookinete. 

In  this  work,  we  study  localization  of  the  Pgs2': 
protein  in  the  sexual  stages  of  the  chicken  malar¬ 
ial  parasite,  P.  gallinaceum  and  reveal  three  as¬ 
pects  of  Pgs28  protein  targeting.  First,  by  deletion  . 
analysis,  we  show  that  the  signal  sequence  and 
C-terminal  hydrophobic  region  of  Pgs28  are  es¬ 
sential  for  cell  surface  localization  of  the  protein 
in  ookinetes.  Deletion  of  the  signal  sequence 
(amino  acids  1-21)  results  in  cytoplasmic  local¬ 
ization  of  Pgs28,  suggesting  that  this  motif  directs 
nascent  Pgs28  into  the  ER  of  the  ookinete.  Simi¬ 
larly,  deletion  of  the  C-terminal  domain  (amino 
acids  194-212)  leads  to  Pgs28  accumulation 
within  the  ookinete.  This  is  consistent  with  the 
requirement  for  a  C-terminal  GPI-anchor  for 
membrane  localization.  Thus,  protein  motifs  re¬ 
quired  for  surface  localization  of  Pgs28  in  the 
v  native  system  (P.  gallinaceum)  are  similar  to  those 
defined  for  Pbs21  in  Sf9  cells,  revealing  universal 
themes  in  trafficking  of  Pxs21/25  proteins.  Sec-  i 
ond,  vesicular  structures,  shown  to  contain  Pgs28 
protein,  are  visualized  by  immunoelectron  mi-  : 
croscopy.  These  may  be  components  of  the 
ookinete  trafficking  machinery.  Finally,  we  show 
that  a  Pgs28-GFP  fusion  protein  is  shed  from  the.  { 
surface  membranes  of  live  ookinetesrThese  data 
have  implications  for  the  role  of  Pgs28  in  parasite 
motility  and  interactions  with  the  mosquito 
midgut. 


2.  Materials  and  methods 


,^2.I.  Construction  of  deletions  in  the  Pgs28  coding 
region  ' 


The  parent  plasmid  from  which  all  constructs 
•Ayyere  derived  was  the  Pgs28tl»;luc  vector  that  has 
been  previously  described  [10].  Briefly,  Pgs28.1-1uc 
...  contains  a  pgs28-luciferase^  gene  fusion  flanked  by  , 
~  1 .9  kilobases  of  pgs28'  5'  sequences  and  ~  0.6 
kilobases  of  pgs28  3'  sequences  cloned  into  the 


j-yryv-^ 
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J^jEM  plasmid.  The  ^^-/wci/erase  fusion  gene  > 

•  under  the  control  of  these  flanking  sequences  is 
efficiently  expressed  in  the  sexual  stages  of  P. 
gallinaceum  [11]. 

All  restriction  enzymes  were  obtained  from 
New  England  Biolabs  (NEB)  and  used  with  the 
supplied  buffers.  Deletions  in  the  Pgs28  coding 
region  that  preserved  the  reading  frame  of  the 
protein  were  generated  using  a  PCR-based  muta¬ 
genesis  strategy.  Initially,  the  expression  plasmid, 
Pgs28.1-luc,  was  digested  with  Bam  HI  to  release 
the  luciferase  gene.  Next,  depending  on  the  dele¬ 
tion,  the  Bam  Hi-digested  plasmid  was  further 
incubated  with  Bglll  (deletions  of  the  first  EGF- 
like  repeat  and  the  signal  sequence),  Sac  I  (dele¬ 
tion  of  the  second  EGF-like  repeat)  or  Bsgl 
(deletion  of  C-terminal  domain).  PCR  reactions, 
using  the  Pgs28.1-luc  plasmid  as  template  and  the 
following  sets  of  primers,  were  performed. 

2.1.1.  Deletion  of  the  first  EGF-like  repeat 
(amino  acids  28-71) 

SP5  —  5'  GGT  TTG  TGG  ACA  ATG  G  3' 
SP7  —  5'  GTG  GGA  TCC  GAA  GGT  TCA 
TCA  TCT  GAA  GG  3' 

The  Bam  HI  site  is  underlined.  The  PCR 
product  was  digested  with  Bglll  and  Bam  HI  and 
cloned  into  the  digested  Pgs28.1-luc  vector. 

2.1.2.  Truncation  of  the  second  EGF-like  repeat 
(amino  acids  72-103) 

SP8  —  5'  CGT  AGG  ATC  CAA  AGG  AAT 
GTG  GAG  AAG  G  3' 

—  40  Universal  primer  —  5'  GTT  TTC  CCA 
GTC  ACG  ACG  TTG  TA  3' 

The  BamHl  cloning  site  is  underlined.  The 
PCR  product  was  digested  with  SacI  and  BamHl 
and  cloned  into  the  Pgs28.1-luc  vector. 

2.1.3.  Deletion  of  the  C-terminal  domain  (amino 

*  acids  194-212)  •  v  i'  -  ^vv£‘v 

SP9  —  5'  CTC  ATA  AAG  GCC  AAG  AAG 

QG  3' ..  •  -  "I--:;- 

^  ''SPtO"—  y  CAT  TGA  AAA  GGG  ATT  AGG 


TGC  TAT  TAC  CTG  CAC  TAG'GAG  GTG 
G  3' 

Nucleotides  in  bold  and  underlined  differ  from 
to  the  pgs28  gene  sequence;  the  T  was  included  to 
incorporate  a  stop  codon  while  the  CTGCAC 
sequence  is  the  binding  site  for  the  typellS  restric¬ 
tion  enzyme  Bsgl.  The  inclusion  of  the  Bsgl  site 
alters  a  serine  residue  to  alanine.  The  PCR 
product  was  digested  with  BamHl  and  Bsgl  and 
cloned  into  the  digested  Pgs28.1-luc  vector. 


2.1.4 .  Deletion  of  the  signal  sequence  (amino 
acids  1-21) 

Two  separate  PCR  reactions  were  performed  to 
generate  the  signal  sequence  deletion.  The  first 
PCR  reaction  used  the  following  primers, 

SP5  —  described  above 
SP  13.1  —  5'  GGC  CGG  CCG  GCC  ATG 
GAC  TAG  GAA  TTT  TCATTT  TTT  TAA 
ATA  AAT  G  3' 

Nucleotides  in  bold  and  underlined  aFe  differ¬ 
ent  from  pgs28  gene  sequences;  CCATGG  is  a 
Ncol  restriction  site  while  CAT  is  a  start  codon 
(antisense  strand).  This  PCR  product  was  digested 
with  Bglll  and  Ncol. 

SP14  —  5'  ATC  GTA  CCA  TGG  GCT  CCT 
TCA  GAT  GAT  G  3' 

Luc.seq  —  5'  TCT  AGA  GGA  TAG  AAT 
GGC  GC  y 

The  Ncol  site  is  in  bold  and  underlined.  This 
PCR  product  was  digested  with  Ncol  and 
BamHl.  The  two  PCR  products  were  cloned 
simultaneously  into  BglHj BamHl  digested 
Pgs28.1-luc  in  a  three-way  ligation.The  Ncol  site 
introduces  two  additional  amino  acid  residues 
(glutamine  and  tryptophan).  ; 

The  luciferase  gene  was  re-introduced  into  the 
deletion  plasmids  via  the  Bam  HI  site.  All  deletion 
plasmids  were  sequenced  through  the  junctions  as 
well  as  the  coding  regions  that  had  been  PCR-am- 
plified  to  ensure  lack  of  PCR-generated 
mutations.  ; .  y:; . 

To  replace  luciferase  with  green  fluorescent 
protein  (GFP),  the  Pgs28.1-luc  plasmid  and  its 
variants  were  digested  with  BamHl  to  drop  out 
the  luciferase  gene.  The  GFP  (Superglow  GFP 
[12])  coding  region  was  amplified  with  the  follow- 
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ing  primers  from  the  pJH23  yeast  expression  vec¬ 
tor.  Bglll  sites  are  underlined. 

SP  1 1  —  5'  GGG  ATA  GGG  AGA  TCT  AAT 
GGC  TAG  CAA  AGG  AG  3' 

SP  12  —  5'  GGG  TTT  AGA  TCT  AAG  CAG 
CCG  GAT  CCT  TTG  TC  3' 

The  resulting  PCR  product  was  digested  with 
Bglll  and  ligated  into  the  bacterial  expression 
vector,  pRSET  C  (Invitrogen)  at  the  unique  Bglll 
site.  Expression  of  GFP  was  confirmed  by  trans¬ 
forming  JM109(DE3)  Escherichia  coli  (Promega) 
with  the  pRSET-GFP  construct  and  plating  on 
LB  medium  with  100  jig  ml"1  ampicillin.  The 
f  presence  of  low  levels  of  T7  polymerase  in  JM109 
^  (DE3)  cells  results  in  expression  of  GFP  from  the 

pRSET  vector  and  this  GFP  expression  was  as¬ 
sessed  by  placing  the  bacterial  plate  on  a  low 
wavelength  UV  trans-illuminator.  GFP-positive 
colonies  emitted  a  green  fluorescence  when  ex¬ 
posed  to  UV  light.  The  pRSET-GFP  plasmid 
from  these  GFP  positive  colonies  was  isolated  and 
digested  with  Bglll  to  release  the  GFP  coding 
region;  due  to  compatibility  of  Bglll  and  item  HI 
overhangs,  GFP  was  then  ligated  into  itomHI-di- 
gested  Pgs28.1  plasmid  and  the  deletion  vectors. 

The  coding  regions  of  all  Pgs28  deletion  plas¬ 
mids  were  sequenced  and  no  PCR-generated  er¬ 
rors  detected. 

[•.'  2.2.  Parasites  and  transfections 


¥ 


P.  gallinaceum  parasites  were  propagated  in 
White  leghorn  chickens  by  serial  injection  into 
wing  veins.  At  parasitemias  of  50-70%,  blood 
was  withdrawn  by  heart  puncture.  Gametogenesis 
was  induced  as  described  previously  [10],  with  the 
^inclusion  of  xanthurenic  acid  (Sigma)  [13}  at  a 
final  concentration  of  50  pM  in  the  exfiagellation 


buffer.  Gametes  and  zygotes  were  purified,  also  as 
:  described  previously,  and  1  x  107  cells  were  elec- 

Mfriifi  |  troporated  (BioRad)  with  100  jig  of  DNA  (QIA-. 
piP^V'GEN)-  at  settings  of  25jxF  and  0.5  kV  in  0.2  cm 
£r  ~r~- ~~  cuvettes  (BioRad).  Parasites  were  incubated:  at 
in  Medium  199  (Gibcb-BRL)  and  harvested 
I  ■?■■$****? for  analysis  at  approximately  48  h  after 


transfection. 


2.3.  Immunofluorescence  and  confocal  microscopy 

Transfected  parasites  were  washed  once  with 
phosphate-buffered  saline  (PBS)  and  allowed  to 
adhere  onto  poly-L-lysine  coated  slides.  After  fixa¬ 
tion  in  4%  paraformaldehyde,  cells  were  either 
permeabilized  with  0.1%  Triton  X-100  and  50 
mM  glycine  in  PBS  for  15  min  or  allowed  to 
remain  non-permeabilized.  Subsequently,  cells 
were  stained  with  primary  antibodies  mAbIID2- 
B3B3  (1:50  dilution)  to  recognize  endogenous 
Pgs28  and  polyclonal  affinity-purified  anti-GFP 
antibody  (1:500  dilution).  Secondary  antibodies 
(goat  anti-mouse-rhodamine  and  goat  anti-rabbit 
fluorescein,  Boehringer- Mannheim)  were  used  at 
1:250  dilutions.  Confocal  microscopy  was  per¬ 
formed  on  a  BioRad  MRC-1024  laser  scanning 
confocal  microscope. 

2.4.  Double  labeling  immunoelectron  microscopy 

Ookinetes  were  fixed  for  30  min  at  4°C  with  1% 
.formaldehyde,  0.1%  glutaraldehyde  in  0.1  M 
phosphate  buffer,  pH  7.4.  Fixed  samples  were 
washed,  dehydrated  and  embedded  in  LR  White 
resin  (Polysciences,  Inc.,  Warrington,  PA).  Thin 
sections  on  nickel  grids  (without  a  supporting 
film)  were  blocked  in  PBSB-Tween  for  30  min. 
The  composition  of  PBSB-Tween  is  as  follows, 
PBS  was  supplemented  with  l%  w/v  bovine  serum 
albumin  fraction  V  and  0.01%  v/v  Tween  20. 
Labeling  for  the  first  antigen  was  done  on  face  ‘A’ 
of  the  grid  and  for  the  second  antigen  on  face  ‘B’ 
of  the  grid.  Briefly,  grids  (face  A)  were  incubated 
with  anti-luciferase  antibody  diluted  1:20  in 
PBSB-Tween  for  2  h  at  25°C.  Negative  controls 
included  normal  rabbit  serum  and  PBSB-Tween 
applied  as  the  primary  antibody.  After  washing, 
.  grids  were  incubated  at  room  temperature  for  1  h 
in  10  rim  gold-conjugated  goat  anti-rabbit  IgG 
(Amersham  Life  Sciences,  Arlington,  IL)  diluted 
1:20  in  PBSB-Tween,  rinsed  with  PBSB-Tween. 
Grids  (face  B)  were  then  incubated  with  anti- 
Pgs28  antibody  diluted  1 :5-l:  lO  in  PBSB-Tween 
for  2  h  at  25°C,  and  then  incubated  in  15  nm 
gold-conjugated  goat  anti-mouse  IgG  (Amer¬ 
sham).  Negative  controls  included  normal  mouse 
serum  and  PBSB-Tween  applied  as  the  primary 
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anaowy. 7  Unas  were'  this# fixed  with  2:5%  glu-? 
taraldehyde  to  stabilize  the  gold  particles.  Sam¬ 
ples  were  stained  with  uranyl  acetate  and  lead 
citrate,  and  then  examined  with  a  Zeiss  CEM902 
electron  microscope  (Zeiss,  Oberkochen, 
Germany). 


3.  Results 

3.1.  Pgs28  mutant  proteins  are  expressed  in 
sexual  stage  parasites 

To  identify  the  protein  motifs  that  direct  Pgs28 
to  the  cell  surface,  deletion  constructs  were  gener¬ 
ated  in  expression  vectors  that  express  Pgs28 
fused  to  different  reporter  genes  (luciferase  or 
GFP).  Deletions  were  made  in  motifs  that  were 
predicted  to  be  important  in  surface  localization 
based  on  other  studies,  the  signal  sequence  (ASS) 
and  the  C-terminal  hydrophobic  domain  (AC- 
term).  Additionally,  motifs  that  would  not  be 
expected  to  play  roles  in  localization  were  also 
analyzed  as  controls.  Thus  a  deletion  was  gener¬ 
ated  in  the  first  EGF-like  repeat  (AEGF1),  while  a 
truncation  was  made  in  the  second  EGF-like  re¬ 
peat  (AEGF2).  Fig.  1  shows  a  schematic  diagram 
of  the  deletion  constructs.  In  order  to  assess  the 
levels  of  expression  of  the  mutant  proteins,  a 
luciferase  reporter  was  fused  in-frame  to  the  mu¬ 
tant  genes  via  a  unique  Bam  HI  site  in  the  second 
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„  Fig.  L  Schematic  of  in-frame  deletions  of  the  Pgs28  protein. 
All  proteins  were  expressed  from  the  Pgs28.1  expression  vector 
described  in  Goonewardene  et  al.  AUG  represents  the  start 
codon  MdTTAA  the  stop  codon.  A  unique  BamHI  restriction 
site  within  the  second  EGF-like  repeat  was  used  to  clone  the 
*  ^  r^ner^enes  into  Pgs28.  Reporters  used  were  Luc,  luciferase; 
GFP,  green  fluorescent  protein. 
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EGF-like  repeat.  Luciferase  assays  of  sexual  stage 
parasites  transiently  transfected  with  the  deletion 
constructs  showed  that  all  proteins  were  expressed 
in  sexual  stage  parasites  (data  not  shown). 

The  sub-cellular  localization  of  Pgs28  deletion 
proteins  was  assessed  using  GFP  as  a  reporter. 
Parasites  were  transfected  with  a  Pgs28-GFP  ex¬ 
pression  vector;  localization  of  fusion  proteins  in 
non-permeabilized  cells  was  assessed  using  immu¬ 
nofluorescence  assays  (IFA)  and  confocal  mi¬ 
croscopy.  Double  labeling  of  transfected  parasites 
for  both  endogenous  Pgs28  protein  as  well  as  the 
Pgs28-GFP  fusion  protein  was  performed,  using 
secondary  antibodies  conjugated  to  rhodamine 
(endogenous  Pgs28)  and  fluorescein  (Pgs28-GFP) 
to  distinguish  the  two  signals. 

Fig.  2  shows  that  the  Pgs28-GFP  fusion  protein 
was  expressed  in  zygotes  (Panel  A),  ookinetes 
(Panel  C)  and  the  retort  forms  (Panel  B).  Approx¬ 
imately  1-10%  of  the  cells  showed  GFP-positive 
staining,  indicating  a  robust  efficiency  of  transfec¬ 
tion  and  expression.  Panel  A  shows' a  zygote 
expressing  Pgs28-GFP  protein  that  co-localizes 
with  endogenous  Pgs28  protein;  also  visible  is  a 
zygote  that  was  not  transfected  (arrow).  In  non- 
permeabilized  ookinetes  (Panel  C),  both  Pgs28 
and  Pgs28-GFP  show  a  pattern  of  staining  that  is 
most  intense  on  the  cell  surface.  Hence,  Pgs28 
fused  to  the  GFP  reporter  shows  colocalization 
with  endogenous  Pgs28,  and  in  ookinetes  that 
have  not  been  permeabilized,  both  proteins  are 
found  predominantly  on  the  cell  surface,  as 
expected.  '  '  '■  Vv../ 

Having  shown  that  Pgs28  and  Pgs28-GFP 
proteins  localize  predominantly  to  the  cell  surface, 
we  employed  a  well-characterized  strategy  to  ob¬ 
tain  initial  data  regarding  the  localization  of  the 
mutant  forms  of  Pgs28-GFP.  Cells  were  processed 
for  IFA  and  confocal  microscopy  without  prior 
permeabilization,  hence  only  cell  surface-exposed 
proteins  should  be  available  for  antibody 
recognition. 

The  AEGF1-GFP  and  AEGF2-GFP  fusion 
proteins  were  present  on  the  cell  surface  of  non- 
permeabilized  ookinetes^-  in  a  pattern  similar  to 
that  observed  for  Pgs28-GFP  and  at  a  frequency 
similar,  to  that  observed  for  Pgs28-GFP  (GFP- 
positive  cells  were  1-10%  of  total  cells).  Hence, 
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Fig.  2.  IFA  and  confocal  microscopy  of  sexual  stages  of  P.  gallinaceum  expressing  Pgs28-GFP  fusion  proteins.  These  cells  have  not 
been  permeabiiized.  Panel  A,  zygotes;  Panel  B,  retort  form;  Panel  C,  ookinete.  Parasites  were  stained/wth  antibodies  against 
endogenous  Pgs28  (detected  with  secondary  antibodies  conjugated  to  rhodamine,  red  staining)  and  GFP  (detected  with  secondary 
antibodies  conjugated  to  fluorescein,  green  staining).  Colocalization  of  endogenous  Pgs28  and  Pgs28-GFP  results  in  yellow 
fluorescence.  The  arrow  denotes  a  zygote  that  has  not  been  transfected. 

Fig.  5.  Live,  motile  ookinetes  shed  Pgs28-GFP  from  their  cell  surface.  Ookinetes  observed  under  Nomarski  optics  (Panel  A),  FITC 
filter  (Panel  B).  The  FITC  images  have  been  over  exposed  to  reveal  the  fluorescent  trails  shed  from  the  parasites. 


:  5.  Paiankar  et  at./ Molecular  and  Biochemical  Parasitology  111  (201)0)  425-435 


deletion  of  the  EGF-repeats  appears  to  have  little 
effect  on  localization  of  Pgs28.  However,  without 
permeabilization,  cells  transfected  with  an  expres¬ 
sion  vector  containing  the  AC-term-GFP  protein 
exhibited  no  GFP  fluorescence  as  detected  by  IFA 
and  confocal  microscopy.  Similarly,  cells  trans¬ 
fected  with  an  expression  vector  containing  the 
ASS-GFP  open  reading  frame  also  showed  no 
GFP  signal  (GFP-positive  cells  not  detected  over 
hundreds  of  fields).  From  these  results  in  non-per- 
meabilized  ookinetes,  the  Pgs28-GFP,  AEGF1- 
GFP  and  AEGF2-GFP  proteins  appear  to  be  on 
the  surface  of  ookinetes  while  AC-term-GFP  and 
ASS-GFP  proteins  may  be  intracellular. 

3.2.  The  signal  sequence  and  C-terminal 
hydrophobic  region  of  Pgs28  are  required  for  cell 
surface  localization 

To  further  analyze  the  sub-cellular  localization 
of  the  Pgs28-GFP  protein  and  its  mutant  deriva¬ 
tives,  confocal  microscopy  was  performed  on  per- 
meabilized  cells  double-labeled  with  antibodies 
against  endogenous  Pgs28  and  GFP.  Fig.  3  shows 
representative  images  of  stained  ookinetes  perme- 
abilized  with  Triton  X-100.  In  contrast  to  non- 
permeabilized  cells  shown  in  Fig.  2,  most 
permeabilized  cells  showed  intracellular  staining 
of  both  endogenous  Pgs28  (Fig.  3,  Panel  A)  and 
the  GFP  fusion  proteins  (Fig.  3,  Panels  B-D). 
Some  permeabilized  ookinetes  showed  predomi¬ 
nant  cell  surface  staining  (cells  denoted  by  arrows 
in  Fig^f,:  Panels  C  and  F).  The  intracellular 
staining  rniay  indicate  the  presence  of  Pgs28  in 
vesicles”  that  comprise  the  trafficking  machinery. 
The  Pgs28-GFP,  AEGF1-GFP  and  AEGF2-GFP 
fusion  proteins  showed  colocalization  with  en¬ 
dogenous  Pgs28  as  evidenced  by  yellow  staining 
on  both  the  cell  surface  and  internal  structures 
(Fig.  3,  Panels  B-D).  Moreover,  endogenous 
Pgs28  and  GFP-fusion  proteins  appear  to  be  ex¬ 
cluded  from  the  nuclear  region  of  ookinetes. 

In  contrast,  deletion  of  the  signal  sequence  (Fig. 
3,  Panel  F)  resulted  in  intracellular  retention  of 
the  GFP  fusion  protein  while  endogenous  Pgs28 
was  still  localized  to  the  surface.  Staining  for  the 
.^SS^F^^o^n-.Avas'  •  diffuse,-  suggesting  cyto¬ 
plasmic  localization.  Deletion  of  the  C-terminal 


domain  also  resulted  in  intracellular  localization 
of  GFP-fusion  protein  (Fig.  3,  Panel  E).  However, 
in  contrast  to  the  uniformly  diffuse  intracellular 
signal  obtained  with  the  ASS-GFP  proteins,  the 
AC-term-GFP  proteins  appeared  to  be  distributed 
within  the  ookinete  in  a  similar  pattern  as  seen  for 
Pgs28-GFP  (Fig.  3,  Panel  B).  Hence,  deletion  of 
the  both  the  signal  sequence  and  the  C-terminal 
domain  results  in  intracellular  retention  of  Pgs28, 
with  ASS-GFP  staining  appearing  diffuse  and  cy¬ 
toplasmic.  Similar  results  were  obtained  during 
IFA  and  confocal  analysis  of  AC-term  and  ASS 
proteins  fused  to  luciferase  (Fig.  3,  Panels  G  and 
H),  as  well  as  upon  direct  visualization  of  live 
parasites  expressing  the  AC-term-GFP  and  ASS- 
GFP  fusion  proteins  (Fig.  3,  Panels  I  and  J). 

3.3.  Immunoelectron  microscopy  defines  organelles 
involved  in  trafficking  of  Pgs28 

Immunoelectron  microscopy  was  used  to  iden¬ 
tify  organellar  structures  through  which  traffick¬ 
ing  of  the  Pgs28  protein  occurs  (Fig.  4).  Analysis 
of  ookinetes  revealed  that  endogenous  Pgs28 
proteins  were  present  on  both  the  cell  surface  as 
well  as  in,  intracellular  vesicles  (Fig.  4,  large  gold 
particles)  and  Pgs28-luciferase  fusion  proteins 
were  localized  to  the  same  compartments  as  en¬ 
dogenous  Pgs28  (Fig.  4,  small  gold  particles). 
Therefore,  these  vesicular  organelles  appear  to  be 
components  of  the  trafficking  machinery  that 
transports  Pgs28  to  the  surface  membrane  of  the 
ookinete,  and  may  constitute  the  ER-Golgi 
network.  -  - 

3.4.  Pgs28-GFP  proteins  are  shed  from  the 
surface  of  motile  ookinetes 

Previous  work  by  Shahabuddin  et  al.  has 
shown  that  P.  gallinaceum  ookinetes  can  be 
stained  with  the  lipophilic  dye  PKH26.  Three  to 
four  hours  post-staining,  the  dye  begins  to  shed 
from  the  posterior  end  of  the  ookinetes  suggesting 
that  the  ookinete  surface  membrane  is  a  dynamic 
structure  possibly  due  to  the  motile  nature  of  the 
ookinete  [9]*  This  observation  suggested  that  lo¬ 
calization  of  Pgs28  also  might  be  affected  by 
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Fig.  4.  Immuno-electron  microscopy  showing  localization  of 
endogenous  Pgs28  and  Pgs28-luciferase  fusion  proteins.  Large 
gold  particles  (15  nm)  label  Pgs28  while  small  gold  particles 
(10  nm)  label  Pgs28-Luc.  Arrows  indicate  proteins  present  on 
the  cell  surface  and  internal  vesicles. 


motility  of  the  ookinete  and  prompted  us  to  an¬ 
swer  whether  Pgs28  was  shed  from  the  surface  of 
migrating  ookinetes.  Pgs28-GFP  provides  a  facile 
tool  to  ask  this  question  as  live  parasites  can  be 
visually  monitored  by  microscopy  for  localization 
of  the  GFP  fusion  protein.  Live  parasites  were 
placed  on  slides  for  4  h,  then  observed  with 
Nomarski  optics  (Fig.  5,  Panel  A)  and  under  a 
FITC  filter  (Fig.  5,  Panel  B).  While  no  trail  was 
distinctly  visible  with  Nomarski  optics,  Pgs28- 
■  GFP  fluorescence  revealed  that  the  motile 
-ookinete  shed  a  fluorescent  trail  during  its  move¬ 
ments  on  the  slide.  Similar  results  were  obtained 
.for  both  the  AEGF1-GFP  and  AEGF2-GFP 
4|proteins  but  no  fluorescent  trails  were  detected  ’ 
?when  ookinetes  expressed  ASS-GFP  and .  AC- 
#term-GFP  (Fig."  3,  Panels  I  and  J).  These  data 
indicate  that  Pgs28  is  shed  from  the  surface  of 
!  motile  ookinetes  and  are  supportive  of  the  hy¬ 
pothesis  that  deletion  of  the  signal  sequence  or 
C-terminal  domain  of  Pgs28  results  in  intracellu¬ 
lar  retention  of  the  protein. 


The  sexual  stages  of  P.  gallinaceum  have  been 
exploited  effectively  in  previous  studies  on  tran¬ 
scriptional  regulation  of  the  pgs28  gene  [11,14],  In 
this  paper,  we  study  trafficking  of  the  Pgs28 
protein  and  reveal  that  cell  surface  localization  of 
Pgs28  on  ookinetes  is  critically  dependent  on  two 
protein  motifs,  the  signal  sequence  and  the  C-ter- 
minal  hydrophobic  domain.  As  might  be  ex¬ 
pected,  the  EGF-repeats  are  not  involved  in 
directing  localization  of  Pgs28. 

The  N-terminal  signal  sequence  is  necessary  in 
the  early  stages  of  the  trafficking  pathway  to  the 
cell  surface,  for  the  targeting  of  nascent  proteins 
into  the  ER-Golgi  network.  Immunofluorescence 
analysis  shows  that,  consistent  with  this  role,  dele¬ 
tion  of  the  signal  sequence  results  in  diffuse, -cyto¬ 
plasmic  localization  of  a  Pgs28-GFP  fusion 
protein.  Elegant  work  in  Plasmodium  falciparum 
has  identified  a  bipartite  signal  sequence  for  the 
localization  of  proteins  to  a  specialized  organelle, 
the  apicoplast  [15].  The  classical  signal  sequence 
of  the  bipartite  motif  is  required  for  import  into 
the  parasite  secretory  pathway,  while  the  plant¬ 
like  transit  peptide  specifically  targets  proteins  to 
the  kpicoplast.  Similar  to  our  data  with  the  Pgs28 
signal  sequence,  deletion  of  the  signal  peptide  of 
the  acyl  carrier 'protein  (ACP)  results  in  cytoplas¬ 
mic  localization  of  GFP-tagged  ACP  [15]. 

Deletion  of  the  C-terminal  domain  results  in 
retention  of  Pgs28-GFP  fusion  proteins  in  internal 
structures,  as  these  proteins  lack  the  signal  for 
insertion  into  the  surface  membrane  (GPI-an- 
chor).  These  internal  structures  have  been  visual¬ 
ized  by  immunoelectron  microscopy  (Fig.  4)  and 
shown  to  contain  Pgs28;  these  vesicles  may  be 
components  of  the  ER/Golgi  of  P.  gallinaceum 
ookinetes.  Previous  studies  that  performed  elec¬ 
tron  microscopy  on  ookinetes  and  zygotes  [16] 
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£  £  IFA  and  confocal  microscopy  of  permeabilized  cells.  Panel  A,  Cells  that  were  mock  transfected  without  DNA;  Panel  B 
Pgs28-GFP;  Panel  C,  AEGF1-GFP;  Panel  D,  AEGF2-GFP;  Panel  E,  AC-terminus-GFP;  Panel  F,  A  signal  sequence-GFP.  Panel  G, 
AC-tenmnus-luciferase;  Panel  H,  ASS-luciferase.  Red  staining  indicates  endogenous  Pgs28  protein  and  green  staining  indicates  GFP 
and  luciferase  fusion  proteins.  Arrows  indicate  ookinetes  that  express  Pgs28  predominantly  on  the  cell  surface.  Panel  I,  a  live 
W^?°lciaet*  expressing  AC-tenninus-GFP;  observed  under  FITC  filter  without  immunostaining.  Panel  J,  a  live  ookinete  expressing 
ASS-GFP;  also  observed  under  FITC  filter  without  immunostaining. 
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have  identified  vesicular  structures  corresponding 
to  the  ER;  here  we  show  that,  indeed,  vesicular 
structures  within  ookinetes  do  contain  Pgs28 
proteins.  In  our  IFA  experiments,  all  ookinetes 
show  cell  surface  localization  of  Pgs28  while  a 
subset  of  ookinetes  does  not  exhibit  intracellular 
localization  of  endogenous  Pgs28  (Fig.  3,  Panels 
C  and  F).  This  could  be  due  to  the  fact  that 
trafficking  of  Pgs28  is  dynamic  and  fixation  dur¬ 
ing  immunofluorescence  captures  ookinetes  at  dif¬ 
ferent  stages  of  this  process. 

The  requirement  for  the  signal  sequence  and 
C-terminal  domain  is  similar  to  results  obtained 
for  Pbs21  localization  in  a  heterologous  expres¬ 
sion  system  [6].  Similarly,  deletion  of  the  GPI-an- 
chor  addition  signal  of  variant  surface 
glycoproteins  (VSG)  in  Trypanosomes  resulted  in 
delayed  forward  transport  of  the  mutant  proteins 
and  retention  in  the  ER  [17].  Taken  together, 
these  data  indicate  that  the  C-terminal  domain  of 
Pgs28  contains  a  signal,  presumably  the  GPI-an- 
chor  itself,  which  allows  efficient  passage  of  the 
protein  through  the  cellular  trafficking  machinery. 

The  results  described  above  indicate  that 
trafficking  of  at  least  one  GPI-anchored  protein 
(Pgs28)  in  ookinetes  follows  previously  described 
paradigms  of  protein  trafficking  in  parasitic  pro¬ 
tozoa  and  other  eukaryotes.  These  common 
,  themes  in  pjotein  trafficking  are  underscored  by 
the  fact  that  identical  protein  motifs  were  defined 
f;  for  localization  of  two  Pxs21/25  proteins  to  the 
?  cell  surface  in  two  completely  different  systems, 
Pbs21  in  Sf9  insect  cells  [6]  and  Pgs28  in  P. 
gallinaceum  ookinetes  (this  report).  Hence,  con¬ 
trary  to  experiments  in  P.  falciparum  where  the 
promoter  of  the  AMA-1  gene  played  an  impor¬ 
tant  role  in  appropriate  localization  of  the  protein 
[18],  the  precise  timing  of  expression  of  Pgs28  and 
Pbs21  is  not  critical  to  localization. 

<  Some  cell-specific  characteristics  do  emerge., 
while  studying  protein  localization  in  ookinetes. 
For  example,  Pgs28  is  shed  from  the  surface  of 
motile  ookinetes  most  likely  in  trails  that  have 
-.  been  previously  identified  [9].  This  result  raises , 
•interesting  questions  regarding  a  potential  func- 
gtiqnfor  Pgs28  in  motility  or  cell-cell  interactions 
between  ookinetes  and  the  mosquito  mid-gut 
membranes.  Based  upon  the  functions  of  EGF-re- 


peats  in  other  systems  [8],  the  EGF-repeats  in 
Pgs28  will  certainly  be  involved  in  protein  interac¬ 
tions  and  signaling  between  ookinetes  and  midgut 
cells.  Unpublished  knockout  experiments  of  the 
pbs21  and  pbs25  genes  suggest  that  these  members 
of  the  Pxs21/25  family  play  limited  roles  in 
ookinete  motility  and  invasion  but  may  be  in¬ 
volved  in  oocyst  development  (Andy  Waters,  per¬ 
sonal  communication).  Similar  shedding  of 
GPI-anchored  VSGs  from  the  surface  of  Try¬ 
panosomes  has  been  reported.  The  functional  sig¬ 
nificance  of  this  shedding  is  thought  to  be  evasion 
of  complement-mediated  lysis  [19]  or  replacement 
of  the  VSGs  on  the  surface  of  T.  brucei  with 
procyclin  proteins  [20].  In  P.  gallinaceum ,  shed¬ 
ding  of  the  membrane  along  with  proteins  like 
Pgs28  may  also  be  required  for  immune-evasion 
or  replacement  with  newly  synthesized  cell  surface 
proteins. 

In  conclusion,  this  work  sheds  light  on  the  j 
protein  signals  required  for  the  transport  of  Pgs28 
to  the  surface  of  ookinetes.  The  data  also  raise  j 
questions  regarding  the  function  of  Pgs28  in  the  j 
development  of  P.  gallinaceum  within  its  , 
mosquito  vector.  High  efficiencies  of  transfection,  j 
ready  availability  of  expression  vectors  and  re-  j 
porter  genes,  and  applicability  of  molecular  and 
cell  biology  techniques  will  make  P.  gallinaceum  ! 
zygotes  and  ookinetes  an  excellent  system  for 
further  analysis  of  the  biology  of  the  pgs28  gene 
and  other  genes  expressed  wf  Plasmodium  sexual 
stages. 
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The  advent  of  high-density  gene  array  technology  has  revolutionized  approaches  to  drug  design,  develop¬ 
ment,  and  characterization.  At  the  laboratory  level,  the  efficient,  consistent,  and  dependable  exploitation  of  this 
complex  technology  requires  the  stringent  standardization  of  protocols  and  data  analysis  platforms.  The 
Aflymetrix  YE6100  expression  GeneChip  platform  was  evaluated  for  its  performance  in  the  analysis  of  both 
global  (6,000  yeast  genes)  and  targeted  (three  pleiotropic  multidrug  resistance  genes  of  the  ATP  binding 
cassette  transporter  family)  gene  expression  in  a  heterologous  yeast  model  system  in  the  presence  and  absence 
of  the  antimalarial  drug  chloroquine.  Critical  to  the  generation  of  consistent  data  from  this  platform  are  issues 
involving  the  preparation  of  the  specimen,  use  of  appropriate  controls,  accurate  assessment  of  experiment 
variance,  strict  adherence  to  optimized  enzymatic  and  hybridization  protocols,  and  use  of  sophisticated 
bioinformatics  tools  for  data  analysis. 


A  universal  challenge  to  drug  therapy  is  the  development  of 
drug  resistance.  Efforts  to  understand  the  molecular  mecha¬ 
nisms  of  the  emergence  of  resistance  to  drugs  span  the  fields  of 
infectious  disease,  cancer,  and  toxicology.  The  eventuality  of 
drug  resistance  necessitates  the  ongoing  development  of  new 
drugs  and  interventions.  A  decade  of  research  has  identified  a 
class  of  genes  associated  with  multidrug  resistance  (8,  9). 

The  multidrug  resistance  genes  (mdr  genes)  are  part  of  the 
ATP  binding  cassette  (ABC)  transporter  genes  in  mammalian 
cells  (4,  7,  10).  To  facilitate  the  detection  of  drug  resistance 
and  to  expedite  the  development  of  new  drugs,  several  in  vitro 
model  systems  have  been  developed  that  examine  the  activity 
of  mdr  and  ABC  transporters.  One  such  system  is  the  heter¬ 
ologous  yeast  model  in  which  the  genes  PDR5,  PDR10,  and 
SNQ2,  members  of  the  pleiotropic  drug  resistance  (pdr )  family 
in  yeast,  have  been  associated  with  drug  resistance  (2,  9, 10, 15, 
16,  17,  18).  Observations  that  there  may  be  30  or  more  genes 
in  yeast  that  are  related  by  sequence  homology  to  the  ABC 
transporter  gene  family  complicate  the  association  of  drug 
resistance  with  a  particular  gene  (3).  The  Saccharomyces  cer- 
evisiae  genome  sequencing  project  revealed  31  ABC  genes, 
which  have  been  classified  into  six  distinct  subfamilies  based  on 
phylogenetic  analysis  (3,  7,  14,  19,  20).  The  pdr  family  is  the 
largest  of  these  subgroups,  with  10  members.  In  total  there  are 
12  ABC  genes  that  have  been  associated  with  modulation  of 
resistance  to  xenobiotics  to  date.  The  PDR5  gene  has  been 
linked  to  resistance  to  cycloheximide,  mycotoxins,  and  cerule- 
nin,  and  its  product  has  been  found  to  transport  glucocorti¬ 
coids  (2,  3,  4,  10,  13).  A  second  member  of  the  pdr  group, 
SNQ2,  has  been  found  to  be  linked  to  resistance  to  4-nitroso- 
quinoline-Af-oxide,  methyl-nitro-nitrosoguanidine,  and  metal 
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ions  such  as  Na+,  Li+,  and  Mn+  (3, 16, 18).  The  tSsnq2  iSpdr5 
deletion  strain  exhibits  a  more  pronounced  sensitivity  to  metal 
ions  and  other  drug  substrates  (3).  PDR10  is  closely  related  to 
PDR5  (65%  sequence  identity);  however,  the  functional  relat¬ 
edness  of  these  genes  remains  to  be  determined.  Interestingly, 
PDR10  has  been  found  to  localize  to  the  cell  surface  like  PDR5 
and  SNQ2  (3,  9). 

With  the  introduction  of  the  Affymetrix  yeast  expression 
GeneChip  YE6100  platform  (YE6100  platform),  it  has  be¬ 
come  feasible  to  plan  experiments  to  simultaneously  assess  the 
changes  in  the  expression  patterns  of  not  only  the  pleiotropic 
drug  resistance  gene  family  but  also  6,000  yeast  genes  (5). 
Previously,  Wodicka  et  al.,  at  Affymetrix,  characterized  the 
basic  performance  characteristics  of  a  prototype  for  the 
YE6100  platform  to  generate  a  global  survey  of  6,000  yeast 
genes  (22).  This  platform  was  refined  and  exploited  by  Cho  et 
ai.  to  survey  the  complete  yeast  genome  (6).  Holstege  et  al., 
using  an  elegant  battery  of  controls,  exploited  the  commer¬ 
cially  available  YE6100  platform  to  assess  the  transcriptional 
control  of  yeast  cell  division  (11).  Winzeler  et  al.  used  a  cus¬ 
tomized  gene  array  platform  for  direct  allelic  scanning  of  the 
entire  yeast  genome  (21). 

To  test  the  practical  potential  of  the  commercially  available 
YE6100  platform  to  address  drug  resistance,  a  well-defined 
heterologous  yeast  model  system  was  chosen.  The  expression 
profiles  of  two  strains  of  5.  cerevisiae  were  evaluated  in  the 
presence  and  absence  of  the  antimalarial  drug  chloroquine. 
Strain  YPH  499  (499)  is  wild  type  and  refractory  to  the  drug 
chloroquine.  Strain  YHW  1052  (1052)  is  a  mutant  with  dele¬ 
tions  in  the  PDR5 ,  PDR10,  and  SNQ2  genes  and  is  thus  more 
susceptible  to  chloroquine.  The  aim  of  this  paper  is  to  detail 
the  technical  aspects  of  the  utilization  of  the  YE6100  platform 
that  are  critical  to  the  generation  of  consistent  and  reliable 
gene  expression  data  in  the  study  of  drug  resistance.  The  im¬ 
plementation  of  the  methods  and  protocols  presented  in  this 
paper  will  facilitate  more  intensive  efforts  to  elucidate  the 
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TABLE  1.  Cell  densities  and  mRNA  yields 


Cell  density  (cells/ml)  at:  mRNA  (p,g) 

Time  «  .  Harvest 

point0  Introduction  -  Control  Treated 

of  drug  Control  Treated  cultures  cultures 
cultures  cultures 


Early 

499 

4.3 

X 

106 

1.7 

X 

107 

1.1 

X 

107 

9.9 

14.7 

1052 

4.5 

X 

106 

1.5 

X 

107 

<N 

00 

X 

106 

25.5 

13.3 

Middle 

499 

4.2 

X 

106 

3.2 

X 

107 

1.4 

X 

107 

6.6 

14.4 

1052 

4.1 

X 

106 

4.2 

X 

107 

1.1 

X 

107 

8.0 

15.4 

Late 

499 

3.0 

X 

106 

3.5 

X 

107 

2.6 

X 

107 

13.1 

6.4 

1052 

3.2 

X 

106 

3.2 

X 

107 

1.8 

X 

107 

9.4 

7.5 

°  Early,  2  h;  middle,  3  h;  late,  4.5  h.  Treated  cultures  received  1.5,  2.5,  and  2.5 
mg/ml  for  the  early,  middle  and  late  time  points,  respectively. 


details  of  the  molecular  interactions  involved  in  the  emergence 
of  drug  resistance.  Two  levels  of  data  analysis,  the  global  as¬ 
sessment  of  functional  gene  families  and  the  targeted  assess¬ 
ment  of  particular  genes,  will  be  addressed  to  demonstrate  the 
type  of  information  gleaned  from  each. 

MATERIALS  AND  METHODS 

Strains  and  media.  The  strains,  1052  and  499,  used  in  this  study  were  the  kind 
gifts  of  Karl  Kuchler  of  The  University  and  Biocenter  of  Vienna,  Vienna,  Aus¬ 
tria.  The  yeast  strain  1052  (Apcfr5::TRl  Asnq2::hisG  ApdrlO::hisG)  was  utilized 
for  this  study  along  with  its  isogenic  parental  strain  499  (MAT*  ade2-101cc 
his3&200  leu.2-hl  lys2‘801am  trpbkl  ura3-52).  Strain  1052  is  deficient  in  three 
ABC  transporters  encoded  in  the  pdr  pathway  (PDR5,  PDR10,  and  SNQ2 ).  In 
strain  1052,  the  deletion  in  PDR5  is  from  nucleotide  (nt)  +399  through  nt  +4456. 
The  deletion  in  PDR10  is  from  nt  —90  through  nt  +4307.  The  deletion  in  SNQ2 
is  from  nt  -6  through  nt  +3899.  The  50%  inhibitory  concentrations  of  the  drug 
chloroquine  are  127  mg/ml  for  499  and  50.00  mg/ml  for  1052  as  determined  in 
nonaerated  liquid  medium  and  in  solid  medium  culture.  In  liquid  culture  the 
50%  inhibitory  concentrations  of  the  drug  chloroquine  are  4.75  ±  0.75  mg/ml  for 
499  and  1.38  ±  0.13  mg/ml  for  1052.  Starter  cultures  were  taken  from  colonies 
lifted  from  freshly  streaked  agar  plates  and  grown  overnight  (to  confluence  at 
2  x  108  cells/ml)  at  30°C  and  300  rpm  in  5  to  10  ml  of  yeast-peptone-dextrose 
medium.  The  5-  to  10-ml  starter  cultures  were  diluted  into  1,200  ml  of  pre¬ 
warmed  and  aerated-yeast-peptone-dextrose  medium  in  a  4-liter  flask  to  a  den¬ 
sity  of  1.5  x  106  cells/ml.  Cultures  were  grown  at  30°C  and  300  rpm  for  2  h  or 
until  the  cell  density  reached  3.0  x  106  cells/ml.  At  this  juncture  the  culture  was 
split  into  two  600-ml  aliquots  in  two  prewarmed  2-liter  flasks.  Chloroquine  was 
added  to  the  treatment  flask  to  a  concentration  of  1.5  or  2.5  mg/ml  from  a 
200-mg/ml  concentrated  stock  of  chloroquine  diphosphate  salt  (Sigma,  St.  Louis, 
Mo.)  dissolved  in  sterile  double-distilled  water.  This  solution  had  a  pH  of  ap¬ 
proximately  4.0.  An  exact  volume  of  sterile  double-distilled  water,  adjusted  to  the 
pH  of  the  chloroquine  solution,  was  added  to  the  control  flask.  Table  1  shows  the 
cell  densities  from  critical  points  in  the  growth  and  treatment  of  the  cultures  used 
in  the  study.  The  assay  points  in  the  study  are  defined  as  early  (2  h  with  or 
without  1.5  mg  of  drug  per  ml),  middle  (3  h,  with  or  without  2.5  mg  of  drug  per 
ml),  and  late  (4.5  h,  with  or  without  2.5  mg  of  drug  per  ml). 

Cell  harvesting  and  preparation  of  poly(A)  RNA.  Cultures  were  harvested 
identically  at  three  time  points:  2,  3,  and  4.5  h.  It  is  imperative  that  all  cultures 
be  treated  exactly  the  same  during  the  harvesting  procedure.  The  overnight  yeast 
culture  was  dispensed  into  12  50-ml  polypropylene  conical  tubes  (Falcon/Becton 
Dickinson  Labware,  Franklin  Lakes,  N .J.)  and  centrifuged  in  a  clinical  centrifuge 
for  5  min  at  4°C  and  at  2,000  X  g.  The  pellet  was  resuspended  in  5  ml  of 
Tri-Reagent  (Molecular  Research  Center,  Woodlands,  Tex.),  and  an  equal  vol¬ 
ume  of  400-p.m-diameter  acid-washed  glass  beads  was  added.  The  mixture  was 
vortexed  for  1  min.  An  additional  20  ml  of  Tri-Reagent  was  added  to  the  mixture, 
and  the  manufacturer’s  instructions  for  the  preparation  of  total  RNA  were 
followed.  Poly(A)  RNA  (mRNA)  was  prepared  from  total  RNA  using  the  Oli- 
gotex  (Qiagen,  Valencia,  Calif.)  method  according  to  the  manufacturer’s  instruc¬ 
tions. 

cDNA  synthesis.  Double-stranded  cDNA  was  synthesized  in  two  steps  using 
the  Superscript  Choice  System  (GibcoBRL,  Rockville,  Md.)  and  the  reverse 
transcription  primer  T7-(dt)24  [5'GGCCAGTGAATTGTAATACGACTCACT 
ATAGGGAGGCGG(T)24  3']  (GENSET  Corp.,  LaJolla,  Calif.).  First-strand 
synthesis  was  carried  out  in  a  20-/jl1  reaction  mature.  Approximately  3.0  pg  of 
mRNA  was  annealed  to  7  p.g  of  T7-(dt)24  primer  at  70°C  for  10  min.  Reverse 
transcription  was  carried  out  at  37°C  for  1  h  in  a  mixture  with  final  concentrations 
of  50  mM  Tris-HCI  (pH  8.3),  75  mM  KC1,  3  mM  MgCl2,  10  mM  dithiothreitol, 


500  p.M  each  dATP,  dCTP,  dGTP,  and  dTTP,  and  20,000  to  30,000  U  of 
Superscript  II  reverse  transcriptase  per  ml,  and  the  reaction  was  terminated  by 
placing  the  tube  on  ice.  Second-strand  synthesis  was  carried  out  in  150  pi, 
incorporating  the  entire  20-pl  first-strand  reaction  mixture  and  a  130-pl  second- 
strand  reaction  mature  for  final  concentrations  of  25  mM  Tris-HCI  (pH  7.5),  100 
mM  KC1,  5  mM  MgCl2,  10  mM  (NH^SCL,  0.15  mM  |3-NAD+,  250  pM  each 
dATP,  dCTP,  dGTP,  and  dTTP,  1.2  mM  dithiothreitol,  65  U  of  DNA  ligase  per 
ml,  250  U  of  DNA  polymerase  I  per  ml,  and  13  U  of  RNase  H  per  ml.  The 
mixture  was  incubated  at  16°C  for  2  h,  whereupon  2  pi  of  T4  DNA  polymerase 
at  5  U/pl  was  added  and  the  incubation  was  continued  at  16°C  for  5  min.  To 
terminate  the  reaction,  10  pi  of  0.5  M  EDTA  was  added.  The  cDNA  was  purified 
using  phenol-chloroform -isoamyl  alcohol  (24:23:1)  saturated  with  10  mM  Tris- 
HCI  (pH  8.0)~1  mM  EDTA  (AMBION,  Inc,  Austin,  Tex.).  The  purified  cDNA 
was  precipitated  with  5  M  ammonium  acetate  and  absolute  ethanol  at  -20°C  for 
20  min.  The  pellet  was  resuspended  in  7  to  9  pi  of  RNase-free  water  to  achieve 
a  final  concentration  of  between  0.25  and  0.65  pg/pl. 

In  vitro  transcription  and  fluorescent  labeling.  Synthesis  of  biotin-labeled 
cRNA  was  carried  out  by  in  vitro  transcription  using  the  MEGAscript  T7  In  Vitro 
Transcription  Kit  (AMBION,  Inc.).  According  to  the  manufacturer’s  instruc¬ 
tions,  0.4  to  1.0  pg  of  double-stranded  cDNA  was  placed  in  a  20-pl  reaction  mix, 
at  room  temperature,  containing  Ambion  lx  reaction  buffer  and  enzyme  mix 
(proprietary).  The  labeling  mix  consisted  of  7.5  mM  ATP,  7.5  mM  GTP,  5.6  mM 
UTP,  1.9  mM  biotinylated  UTP  (ENZO  Diagnostics,  Farmingdale,  N.Y.),  5.6 
mM  CTP,  and  1.9  mM  biotinylated  CTP  (ENZO).  The  reaction  mixture  was 
incubated  at  37°C  for  5  h.  The  biotin-labeled  cRNA  was  purified  using  RNeasy 
spin  columns  (Qiagen)  according  to  the  manufacturer’s  protocol.  The  biotin- 
labeled  cRNA  was  fragmented  in  a  40-pl  reaction  mixture  containing  40  mM 
Tris-acetate  (pH  8.1),  100  mM  potassium  acetate,  and  30  mM  magnesium  ace¬ 
tate,  incubated  at  94°C  for  35  min,  and  then  put  on  ice.  One  microliter  of  the 
intact  biotin-labeled  cRNA  and  2  pi  of  the  fragmented  sample  were  run  on  a  1% 
agarose  gel  to  evaluate  both  the  yield  and  size  distribution  of  the  intact  and 
fragmented  products. 

Hybridization,  staining,  and  scanning  of  the  GeneChip.  The  biotin-labeled 
and  fragmented  cRNA  was  hybridized  to  the  YE6100  Yeast  GeneChip  array 
(Affymetrix,  Santa  Clara,  Calif.)  according  to  the  manufacturer’s  instructions. 
Briefly,  a  220-pl  hybridization  solution  of  1  M  NaCl,  10  mM  Tris  (pH  7.6), 
0.005%  Triton  X-100,  50  pM  control  oligonucleotide  B2  (5'  bioGTCAAGATG 
CTACCGTTCAG  3')  (Affymetrix),  control  cRNA  (Bio  B  [150  pM],  Bio  C  [500 
pM],  Bio  D  [2.5  nM],  and  Cre  X  [10  nM])  (American  Type  Tissue  Collection, 
Manassas,  Va,  and  Lofstrand  Labs,  Gaithersburg,  Md.),  0.1  mg  of  herring  sperm 
DNA  per  ml,  and  0.05  pg  of  the  fragmented  labeled  sample  cRNA  per  pi  was 
heated  to  95  °C,  cooled  to  40°C,  and  clarified  by  centrifrigation  before  being 
applied  to  each  of  the  four  subarrays  (A,  B,  C,  and  D)  that  comprise  the  YE6100 
Yeast  GeneChip  platform.  Hybridization  was  at  40°C  in  a  rotisserie  hybridization 
oven  (model  320;  Affymetrix)  at  60  rpm  for  16  h.  Following  hybridization,  the 
GeneChip  arrays  were  washed  10  times  at  25°C  with  6x  SSPE-T  buffer  (1  M 
NaCl,  0.006  M  EDTA,  0.06  M  Na3P04,  0.005%  Triton  X-100,  pH  7.6)  using  the 
automated  fluidics  station  protocol.  GeneChip  arrays  were  incubated  at  50°C  in 
0.5 X  SSPE-T  for  20  min  at  60  rpm  in  the  rotisserie  oven  and  then  stained  for  15 
min  room  temperature  and  60  rpm  with  streptavidin  phycoerythrin  (Molecular 
Probes,  Inc.,  Eugene,  Oreg.)  stain  solution  at  a  final  concentration  of  10  pg/ml 
in  6x  SSPE-T  buffer  and  1.0  mg  of  acetylated  bovine  serum  albumin  (Sigma)  per 
ml.  The  GeneChip  arrays  were  washed  twice  at  room  temperature  with  6x 
SSPE-T  buffer  and  then  scanned  with  a  GeneArray  Scanner  (Hewlett-Packard, 
Santa  Clara,  Calif.),  controlled  by  GeneChip  3.1  software  (Affymetrix). 

Assay  monitoring  and  controls.  The  TEST  1  GeneChip  (Affymetrix)  was  used 
according  to  the  manufacturer’s  instructions  to  assess  critical  features  of  the 
mRNA  preparations  and  the  cDNA  generated  from  the  yeast  strains  and  to 
evaluate  the  stringency  of  staining  and  hybridization.  In  addition,  a  battery  of 
three  types  of  GeneChip  controls  present  on  the  TEST  1  GeneChip  and  on  each 
of  the  four  arrays  in  the  YE6100  GeneChip  set  were  employed  according  to  the 
manufacturer’s  instructions.  Details  of  the  use  and  performance  of  these  critical 
controls  are  given  in  Results.  A  method  of  mathematical  scaling  was  employed 
by  the  GeneChip  3.1  software  (Affymetrix)  to  normalize  the  fluorescence  signal 
from  each  probe  cell  on  each  GeneChip  and  thus  facilitate  the  reliable  compar¬ 
ison  of  data  from  independent  experiments. 

Data  analysis  algorithm  for  the  assessment  of  variance.  The  Affymetrix  raw 
data  set  was  scrutinized  to  eliminate  any  transcripts  with  fewer  than  50%  of 
probe  cells  contributing  to  the  data.  Subsequently,  the  first  step  in  raw  data 
mining  for  the  assessment  of  variance  captured  all  gene  transcripts  that  were 
present  on  both  GeneChips  being  compared  (PP  data  set).  The  second  step 
required  that  a  decision  be  made  to  define  what  degree  of  change  would  be 
considered  significant.  We  chose  to  approach  this  issue  objectively,  using  a 
distribution  analysis  of  the  complete  PP  data  set  which  defined  a  mean  for  the 
population  of  values  and  subsequently  determined  quartile  percentages  of  25, 50, 
and  75%  above  and  below  that  mean.  For  the  assessment  of  variance,  outliers 
were  defined  as  values  exceeding  the  mean  by  10-fold  and  were  eliminated  from 
the  data  set.  When  the  PP  data  set  was  examined  in  this  way,  a  value  of  3.0-fold 
was  determined  to  be  the  cutoff  for  a  reliable  change  in  expression.  The  value  of 
3.0-fold  was  applied  to  all  subsequent  analyses.  Variances  between  GeneChips 
(intraexperimental  variance)  and  between  independent  mRNA  targets  (interex- 
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FIG.  1.  Performance  of  the  battery  of  GeneChip  controls  with  two  independent  preparations  of  templates  from  strains  1052  and  499.  The  ordinate  shows  the  relative 
fluorescence  values  for  each  of  the  control  markers  listed  on  the  abcissa.  The  linear  regression  r2  value  for  the  standard  curve  generated  by  the  Bio  B,  Bio  C,  Bio  D, 
and  Cre  X  markers  is  0.86. 


perimental  variance)  were  assessed  by  scoring  the  percentage  of  PP  transcripts 
that  exhibit  no  change  relative  to  the  total  number  of  PP  transcripts. 

Data  analysis  algorithm  for  interrogation.  For  experiments  in  which  differ¬ 
ences  in  expression  profiles  between  the  drug-treated  and  untreated  yeast  strains 
were  examined,  the  data  analysis  captured  data  from  genes  that  were  present  in 
both  cases  (PP  data  set),  as  well  as  genes  present  in  one  case  and  absent  in  the 
other  (PA  or  AP  data  set).  All  values  above  the  3.0-fold  cutoff  were  included  in 
the  analysis  of  experimental  expression  profiles.  The  experimental  design  em¬ 
ployed  the  analysis  of  data  from  the  untreated  control  as  a  baseline  for  compar¬ 
ison  to  the  treated  strain  in  all  cases.  The  cumulative  fold  change  for  the 
expression  of  all  genes  in  a  particular  functional  family  was  the  sum  of  the  levels 
of  change  of  gene  expression,  using  the  values  for  the  untreated  strain  as  the 
control. 

Bioinformatics  analyses.  GeneSpring  version  2,1  (Silicon  Genetics,  San  Car¬ 
los,  Calif.)  was  used  to  derive  global  trends  in  the  expression  profiles  and  to 
specifically  assess  the  expression  patterns  of  the  pdr  gene  targets.  We  used  the 
temporal  analysis  of  all  of  the  raw  data  from  the  Affymetrix  platform  normalized 
to  a  single  mean  by  the  GeneSpring  software. 

RESULTS 

Consistent  cell  harvests  and  mRNA  yields.  Table  1  shows 
the  yields  of  cells  and  of  mRNA  across  the  three  time  points  of 
the  experiment  and  at  the  two  concentrations  of  chloroquine 
used  in  the  study.  The  amounts  of  cells  harvested  were  com¬ 
parable  and  equivalent  at  all  time  points. 

Assessment  of  GeneChip  performance.  A  battery  of  controls 
was  used  for  all  experiments.  Three  types  of  GeneChip  con¬ 
trols  are  present  on  the  TEST  1  GeneChip  and  on  each  of  the 
four  GeneChips  in  the  YE6100  set.  The  first  set  of  controls 
consists  of  four  synthetically  generated  plasmid  templates  that 
are  subjected  to  reverse  transcription  to  incorporate  fluores¬ 
cent  label  according  to  the  manufacturer's  instructions  (Af¬ 
fymetrix).  These  four  cRNA  templates,  Bio  B,  Bio  C,  Bio  D, 


and  Cre  X,  are  mixed  in  a  cocktail  to  generate  final  concen¬ 
trations  of  150  pM,  500  pM,  2.5  nM,  and  10  nM,  respectively. 
These  concentrations  generate  a  standard  curve  and  can  thus 
be  used  to  standardize  interexperimental  variation  and  effi¬ 
ciency  of  cDNA  synthesis  and  labeling  and  to  provide  the 
dynamic  range  of  the  assay.  Ultimately,  the  standard  curve 
generated  by  these  templates  can  be  used  to  quantitate  the 
level  of  RNA  expression  for  a  given  gene  on  a  per-cell  basis. 
The  second  set  of  controls  used  on  the  GeneChip  assesses  the 
efficiency  of  cDNA  synthesis  by  quantitating  the  amounts  of  3' 
and  5'  portions  of  target  sequences  generated  during  cDNA 
synthesis  by  assessing  the  expression  of  the  yeast  actin  gene. 
Optimal  synthesis  reactions  will  generate  equivalent  amounts 
of  signal  in  the  3'  and  5'  prime  targets.  The  third  set  of 
GeneChip  controls  involves  the  evaluation  of  the  integrity  of 
the  mRNA  preparation  used  in  the  analysis  and  reports  the 
GeneChip-based  determination  of  equivalent  amounts  of 
mRNA  used  in  the  test.  This  is  achieved  by  the  assessment  of 
the  18S  rRNA  gene  expression  profile,  which  is  divided  on  the 
GeneChip  into  five  sets  of  probe  cells  or  segments  (a  through  e). 

The  results  of  the  analysis  of  TEST  1  GeneChip  controls  for 
two  independent  evaluations  of  strains  499  and  1052  are  shown 
in  Fig.  1.  The  ordinate  indicates  the  relative  fluorescence  in¬ 
tensity  reported  by  the  GeneArray  Scanner.  The  data  from  the 
18S  rRNA  series  show  less  than  a  twofold  range  in  segment  a 
and  no  significant  difference  in  segment  b,  c,  or  d,  except  for 
the  1052  data  point,  which  is  less  than  onefold  lower  in  seg¬ 
ment  c.  This  data  set  supports  the  hypothesis  that  equivalent 
amounts  of  mRNA  were  used  in  the  cDNA  reaction  in  prep¬ 
aration  for  GeneChip  analysis.  Also  shown  in  Fig.  1  are  the 
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TABLE  2.  Descriptive  statistical  analysis  of  the  GeneChip  control  platform 


Control  type 

na 

Relative  fluorescence  intensity*  with  strain: 

1052 

499 

Bio  B,  Bio  C,  Bio  D,  and  Cre  X 

20 

4.13  ±  0.53  (r2  =  0.83) 

3.27  ±  0.56  (r2  =  0.81) 

18S  ribosomal  genes  (a-e) 

40 

0.98  ±  0.38 

1.04  ±  0.43 

Yeast  actin,  3'  and  5' 

36 

2.74  ±  0.55 

3.04  ±  0.55 

*  Total  number  of  data  points  in  the  analysis. 

b  Data  are  expressed  as  the  median  ±  the  standard  error.  Regression  values  for  the  standard  curve  are  for  linear  regression  calculations. 


results  of  the  assessment  of  3'  and  5'  segments  of  the  actin 
gene  expression.  There  is  no  significant  difference  between  the 
fluorescence  values  for  the  3'  end  of  the  yeast  actin  gene  and 
for  the  5'  end  of  the  yeast  actin  gene  in  this  experiment.  This 
data  set  indicates  an  optimal  yield  from  the  cDNA  synthesis 
reaction.  The  manufacturer  (Asymetrix)  suggests  that  the  yield 
of  3'  product  may  vary  by  as  much  as  fourfold.  In  our  hands, 
optimization  of  the  cDNA  synthesis  step  routinely  yielded  less 
than  a  0.5-fold  difference  between  3'  and  5'  segments. 

The  standard  curve  generated  by  the  synthetic  templates  Bio 
B,  Bio  C,  Bio  D,  and  Cre  X  is  shown  in  Fig.  1.  The  curve  has 
an  r2  value  of  0.86  and  was  remarkably  consistent  between 
strains,  between  GeneChips,  and  for  two  independent  tem¬ 
plate  preparations.  Table  2  summarizes  data  on  the  perfor¬ 
mance  of  the  battery  of  the  three  sets  of  controls  that  were 
generated  by  between  20  and  40  independent  GeneChip  as¬ 
sessments.  A  descriptive  statistical  analysis  of  the  data  set 
shows  stringent  inter-  and  intraexperimental  consistency. 

Assessment  of  assay  variance.  Table  3  presents  data  on  the 
results  of  two  independent  expression  profiles  for  each  strain, 
499  and  1052,  in  the  absence  of  drug.  These  data  were  gener¬ 
ated  using  one  of  the  four  GeneChips  that  comprise  the  com¬ 
plete  YE6100  GeneChip  platform  (GeneChips  A  through  D). 
In  each  case  an  independent  growth  and  harvest  of  yeast  cells 
followed  by  an  independent  preparation  of  GeneChip-ready 
template  was  carried  out.  Genes  were  scored  as  being  present 
in  both  sets  of  xlata  (PP),  exhibiting  no  change  in  expression 
between  the  two  sets  of  data,  having  increased  or  decreased, 
and,  finally,  having  increased  or  decreased  by  threefold.  For 
strain  1052,  the  total  number  of  PP  genes  was  1,450,  of  which 
32  increased  by  threefold,  116  decreased  by  threefold,  and 
1,302  (89%)  remained  unchanged,  thus  generating  a  variance 
between  the  two  runs  of  10.2%.  For  strain  499,  the  total  num¬ 
ber  of  PP  genes  was  1,439,  of  which  72  increased  by  threefold, 
153  decreased  by  threefold,  and  1,214  (84%)  remained  un¬ 
changed,  thus  generating  a  variance  between  the  two  runs  of 
15.6%.  To  further  reduce  these  levels  of  interexperimental 
variance,  the  original  culture  was  split  into  two  cultures  and 
reassessed  for  percentage  of  variance.  As  a  result  of  splitting 
the  original  culture  in  this  way,  rather  than  growing  two  side- 
by-side  cultures,  the  variance  was  reduced  to  zero  for  both 
strains,  since  there  were  no  genes  that  changed  greater  than 
threefold  between  the  two  runs. 


Global  expression  profiles  of  strains  1052  and  499  in  the 
presence  and  absence  of  chloroquine.  Shown  in  Fig.  2  and  3  are 
the  results  of  a  global  survey  of  the  6,000  genes  on  the  YE6100 
GeneChip  platform  as  assessed  in  strains  1052  and  499,  respec¬ 
tively,  in  the  presence  and  absence  of  the  drug  chloroquine  and 
at  each  of  the  three  time  points  and  two  drug  concentrations 
used  in  the  study.  The  control  in  each  case  was  the  value  from 
the  strain  in  the  absence  of  the  drug.  Cumulative  fold  change 
values  for  the  functional  families  are  arrived  at  by  simple 
summation  of  the  levels  of  change  from  the  control  for  each 
gene  in  a  functional  family. 

As  compared  with  the  middle  and  late  time  points,  the  early 
time  points  for  both  1052  and  499  exhibit  a  lower  level  of 
expression,  with  some  increase  in  genes  associated  with  mem¬ 
branes  in  strain  499.  At  the  middle  time  point,  however,  both 
strains  exhibit  an  increase  in  gene  expression,  with  few  genes 
showing  a  decrease.  Genes  associated  with  membranes,  me¬ 
tabolism,  and  ribosomes  showed  the  most  increase  in  strain 
1052  at  the  middle  time  point.  The  levels  of  the  cumulative 
increase  in  expression  were  2-  to  10-fold  higher  in  strain  499  at 
the  middle  time  point.  Increases  in  the  expression  of  genes 
associated  with  membranes,  metabolism,  and  ribosomes  were 
similar  in  pattern  but  greater  in  magnitude  to  the  changes  at 
this  time  point  in  strain  1052.  The  most  dramatic  change  oc¬ 
curred  in  strain  499  at  the  middle  time  point  in  the  increase  in 
expression  of  genes  associated  with  synthetic  pathways.  In 
strain  1052,  the  late  time  point  data  set  was  dominated  by  a 
large  decrease  in  the  expression  of  genes  associated  with  mem¬ 
branes.  In  contrast  to  the  case  for  the  two  earlier  time  points, 
most  expression  levels  were  reduced  in  strain  1052  at  the  late 
time  point.  The  expression  of  genes  in  strain  499  was  also 
decreased  at  the  late  time  point  compared  with  the  two  earlier 
time  points.  The  largest  decline  in  expression  was  in  the  genes 
associated  with  translation  and  transcription. 

Targeted  expression  profiles  of  the  pdr  genes  PDR5 ,  PDR10 , 
and  SNQ2  in  strains  1052  and  499  in  the  presence  and  absence 
of  chloroquine.  Figure  4  shows  the  expression  profiles  at  three 
time  points  and  in  the  absence  or  the  presence  of  two  different 
concentrations  of  the  antimalarial  drug  chloroquine.  The  ex¬ 
pression  of  the  gene  PDR5  was  decreased  in  the  1052  mutant 
strain  in  the  presence  and  absence  of  the  drug.  In  contrast,  the 
expression  of  the  gene  PDR10  was  increased  in  strain  1052  in 
the  presence  and  the  absence  of  chloroquine.  The  expression 


TABLE  3.  Calculation  of  variance  for  two  independently  grown  and  tested  samples  of  each  strain 


Strain 

No.  of  genes  present 
in  both  profiles 

No.  (%)  of  genes: 

Variance 

With  no  change 

That  increase 

That  decrease 

With  <3-fold  or 
no  change 

That  increase 
>3-fold 

That  decrease 
s3-fold 

No. 

.* 

1052 

499 

1,450 

1,439 

786  (54.2) 

690  (47.9) 

394(27.2) 
421  (29.3) 

270  (18.6) 
328  (22.8) 

1,302  (89.8) 
1,214  (84.4) 

32  (2.2) 

72  (5.0) 

116(8.0) 

153  (10.6) 

148 

225 

10.2 

15.6 
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FIG.  2.  Cumulative  change  of  gene  expression  levels  in  the  mutant  strain  1052  in  the  presence  of  chloroquine.  The  ordinate  shows  the  cumulative  fold  changes  for 
the  expression  levels  of  genes  categorized  by  the  functional  family  designation  shown  on  the  abcissa.  The  functional  families  are  cell  cycle  and  division  proteins  (CCD), 
drug  resistance  membrane  proteins  (DRM),  kinases  (KIN),  membrane  proteins  (MEM),  metabolic  pathway  proteins  (MET),  ribosomal  proteins  (RIBO),  respiratory 
chain  proteins  (RSP),  synthetic  metabolic  pathways  (SMP),  transcription  and  translation  proteins  (TRAN),  pathology-related  proteins  (PATH),  and  stress-related 
proteins  (SR).  The  expression  level  of  genes  in  the  untreated  sample  is  used  to  determine  the  baseline  for  the  degree  of  change  of  gene  expression.  The  profiles  for 
the  early  time  point,  the  middle  time  point,  and  the  late  time  point  are  shown. 


of  the  gene  SNQ2  was  moderate  but  level  in  strain  1052  in  the 
presence  of  drug  and  moderate  with  a  minor  increase  in  slope 
in  the  absence  of  the  drug.  The  wild-type  strain  499  exhibited 
an  increase  in  the  expression  levels  of  PDR5  in  the  presence  of 
drug  but  not  in  the  absence  of  drug.  In  the  absence  of  drug,  the 
expression  of  the  gene  PDR5  was  moderate  and  level  across  all 
time  points.  The  expression  levels  of  PDR10  and  SNQ2  in 
strain  499  remained  low  and  level  in  both  the  presence  and 
absence  of  the  drug. 

DISCUSSION 

Template  preparation.  Several  approaches  to  the  extraction 
of  total  RNA  and  the  subsequent  preparation  of  mRNA  are 
currently  available.  We  found  that  the  combination  of  two 
commercially  available  kits,  the  Tri-Reagent  and  Qiagen  Oligo- 
tex  kits,  gave  the  most  dependable  results  with  yeast.  The  most 
critical  aspects  of  the  preparation  of  template  for  the  Asy¬ 
metrix  GeneChip  YE6100  platform  are  the  quality  of  the 
mRNA  and  the  degree  to  which  it  is  representative  of  the 
biological  nature  of  the  sample.  To  ensure  a  representative 
sample,  it  is  imperative  to  standardize  the  growth  and  handling 
of  the  yeast  cultures.  Holstege  et  al.  first  suggested  that  the 
attention  to  detail  involved  in  the  growth  and  harvest  of  yeast 
cultures  for  expression  profiling  was  critical  to  the  dependabil¬ 
ity  of  the  data  generated  (11).  We  confirm  and  extend  that 
observation  by  emphasizing  the  added  importance  of  standard¬ 
izing  the  treatment  of  these  strains  with  the  drug  chloroquine 


and  minimizing  experimental  variance  by  splitting  single  cul¬ 
tures  for  drug  treatment.  It  is  imperative  to  ascertain  the  phe¬ 
notypes  of  the  wild-type  and  mutant  strains  in  the  presence  of 
a  drug  prior  to  the  characterization  of  the  expression  profiles 
generated  as  a  result  of  treatment  with  that  drug. 

Quality  control  and  assessment.  The  Affymetrix  GeneChip 
YE6100  is  exquisitely  sensitive  and  necessitates  the  use  of 
powerful  controls  to  assure  that  all  aspects  of  the  procedure 
are  consistent  and  reliable.  Of  the  four  types  of  controls  avail¬ 
able  for  expression  profiling,  using  the  Asymetrix  GeneChip 
YE6100,  we  chose  to  apply  three.  The  only  control  that  we  did 
not  utilize  involved  the  addition  of  synthetic  total  RNA  tem¬ 
plate  to  the  RNA  samples  extracted  from  the  yeast  strains. 
Instead,  we  chose  to  use  data  from  the  3'  and  5'  ends  of  the 
yeast  actin  gene  as  a  more  accurate  and  less  intrusive  measure 
of  the  yield,  quality,  and  representative  nature  of  the  mRNA. 
The  data  generated  by  these  controls  result  directly  from  the 
sample  tested  and  are  not  enhanced  or  quenched  by  the  pres¬ 
ence  of  artificial  template. 

We  have  determined  that  the  battery  of  three  controls  that 
we  routinely  employ  are  essential  to  the  interpretation,  consis¬ 
tency,  and  reliability  of  expression  profiling  experiments.  Per¬ 
haps  the  most  powerful  of  the  sets  of  controls  is  the  standard 
curve  generated  by  the  synthetic  templates  Bio  B,  Bio  C,  Bio 
D,  and  Cre  X.  These  data  points  offer  the  investigator  the 
power  to  express  GeneChip  data  on  a  semiquantitative  level. 
The  18S  ribosomal  protein  series  and  the  yeast  actin  3'  and  5' 
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FIG.  3.  Cumulative  change  of  gene  expression  levels  in  the  wild-type  strain  499  in  the  presence  of  chloroquine.  The  ordinate  shows  the  cumulative  fold  change  for 
the  expression  levels  of  genes  categorized  by  the  functional  family  designation  shown  on  the  abcissa.  The  functional  families  are  cell  cycle  and  division  proteins  (CCD), 
drug  resistance  membrane  proteins  (DRM),  kinases  (JON),  membrane  proteins  (MEM),  metabolic  pathway  proteins  (MET),  ribosoma!  proteins  (RIBO),  respiratoiy 
chain  proteins  (RSP),  synthetic  metabolic  pathways  (SMP),  transcription  and  translation  proteins  (TRAN),  pathology-related  proteins  (PATH),  and  stress-related 
proteins  (SR).  The  expression  level  of  genes  m  the  untreated  sample  is  used  to  determine  the  baseline  for  the  degree  of  change  of  gene  expression.  The  profiles  for 
the  early  time  point,  the  middle  time  point,  and  the  late  time  point  are  shown. 


end  targets  provide  critical  information  on  the  preparation  of 
the  RNA  and  on  the  representative  quality  of  the  cDNA  sub¬ 
sequently  produced.  The  fact  that  all  of  these  controls  reside 
on  each  GeneChip  further  supports  and  ensures  the  genera¬ 
tion  of  dependable  data  both  within  and  between  experiments. 
Most  importantly,  remarkably  low  levels  of  intraexperimental 
variance  can  be  achieved,  despite  the  enormous  number  of 
complex  steps  involved  in  generating  an  expression  profile,  by 
faithful  attention  to  optimized  laboratory  protocols  and  by  the 
vigilant  use  of  the  battery  of  GeneChip  controls. 

Interpretation  of  GeneChip  expression  profiles.  We  em¬ 
ployed  a  well-characterized  heterologous  yeast  model  to  assess 
the  impact  of  the  drug  chloroquine  on  the  yeast  pdr  genes 
PDR5 ,  PDR10,  and  SNQ2 .  We  assessed  the  expression  profile 
data  on  two  levels:  (i)  the  global  analysis  of  cumulative  changes 
in  expression  of  genes  classified  into  broad  functional  families 
and  (ii)  the  targeted  expression  analysis  of  the  three  pdr  genes 
across  the  three  time  points  and  two  drug  concentrations  used 
in  the  study.  Jelinsky  and  colleagues  used  the  global  assess¬ 
ment  of  expression  profiles  to  assess  changes  in  gene  expres¬ 
sion  in  yeast  in  response  to  alkylating  agents  (12).  Alon  and 
colleagues  employed  targeted  expression  and  cluster  analysis 
to  define  expression  patterns  in  colon  tumors  (1). 

The  assessment  of  the  global  alterations  in  expression  pro¬ 
files  of  broadly  defined  functional  families  in  each  of  the  strains 
in  the  presence  of  drug  clearly  identifies  that  in  the  mutant,  the 
functional  family  most  significantly  affected  by  the  drug  is  the 
membrane  protein  group.  Strain  1052  exhibits  a  250-fold  re¬ 
duction  in  the  cumulative  gene  expression  in  the  membrane 


protein  group.  The  functional  family  of  drug  resistance-related 
membrane  proteins  is  also  reduced  in  cumulative  gene  expres¬ 
sion  by  75-fold.  In  contrast  to  these  observations,  the  wild-type 
strain  exhibits  an  increase  in  the  expression  of  membrane- 
associated  proteins  and,  most  significantly,  in  proteins  involved 
with  the  processes  of  transcription  and  translation.  By  the  late 
time  point,  the  wild-type  strain  exhibits  a  100-fold  decrease  in 
the  expression  of  proteins  related  to  transcription  and  trans¬ 
lation.  Clearly  these  two  strains  respond  with  distinct  strategies 
to  the  presence  of  drug.  The  assessment  of  the  degree  of 
cumulative  change  in  the  expression  profiles  of  broadly  defined 
functional  families  of  genes  can  be  readily  made  from  the  data 
reported  by  the  Asymetrix  GeneChip  YE6100  platform.  This 
information  is  most  useful  in  suggesting  the  focus  of  further 
data  mining  to  elucidate  the  specifics  of  a  biological  pathway 
affected  by  the  drug. 

The  GeneSpring  bioinformatics  platform  commercially 
available  from  Silicon  Genetics  interrogates  the  Asymetrix 
GeneChip  YE6100  data  in  a  significantly  more  powerful  way. 
This  tool  allows  for  the  identification  of  the  patterns  and  mag¬ 
nitude  of  expression  of  any  single  gene  assessed  by  the  Af- 
fymetrix  GeneChip  YE6100  over  the  course  of  the  study.  The 
expression  profile  of  individual  targeted  genes  as  well  as  the 
patterns  or  clusters  of  related  genes  can  also  be  elucidated  by 
the  analysis.  In  the  model  system  employed  in  this  study,  the 
promoter  region  of  the  PDR10  gene  was  disrupted.  An  un¬ 
changed  or  reduced  expression  of  this  gene  might  be  predicted 
as  a  result  of  this  deletion.  The  expression  profiles  derived  by 
GeneSpring  analysis  of  the  PDR10  gene  in  the  mutant  strain 
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FIG.  4.  Expression  profiles  of  the  yeast  pdr  genes  PRD5,  PDR10,  and  SNQ2.  The  OTdinate  shows  the  relative  fluorescence  intensity  for  (i)  each  of  the  study  time 
points  (early  [E],  middle  [M],  and  late  [L]),  (ii)  the  two  experimental  treatments  (drug  treated  [T]  and  untreated  [U],  and  (iii)  each  of  the  two  strains  (1052  and  499), 
as  shown  on  the  abcissa. 


1052  exhibit  an  unexpectedly  high  level  of  expression  in  both 
the  presence  and  absence  of  chloroquine.  Several  explanations 
for  this  observation  can  be  proposed. 

The  elevated  levels  of  the  mutant  PDR10  gene  expression 
may  reflect  the  bias  of  the  GeneChip  to  assess  the  3'  region  of 
a  gene.  It  is  important  to  take  into  account  that  the  Aifymetrix 
GeneChip  YE6100  platform  interrogates  25-mer  regions  that 
cover  the  last  600  bp  of  the  3'  end  of  the  gene  (5).  This  region 
is  distal  to  the  deletion  made  at  the  5'  promoter  region  of  the 
PDR10  gene.  Alternatively,  there  may  be  a  difference  in  the 
efficiency  of  the  promoter  region,  or  in  the  stability  or  rate  of 
turnover  of  the  gene  product,  in  the  mutant  as  compared  to 
that  of  the  intact  gene  in  the  wild-type  strain.  In  the  wild-type 
strain,  there  is  an  increase  in  the  production  of  PDR5  in  re¬ 
sponse  to  drug  treatment,  while  the  PDR10  and  SNQ2  expres¬ 
sion  levels  remain  moderate  and  unchanged,  respectively.  This 
pattern  may  reflect  the  specificity  of  the  PDR5  response  to  the 
drug  chloroquine  in  this  strain  (9).  In  contrast,  expression 
levels  of  PDR5  and  SNQ2  in  the  mutant  strain  show  little  or  no 
response  to  the  presence  of  the  drug.  Mechanistic  explanations 
of  the  biological  function  of  the  gene  products  of  PDR5 , 
PDRIOy  and  SNQ2  in  the  mutant  and  wild-type  strains  warrant 
further  investigation.  These  observations  show  the  complexity 
of  the  interpretation  of  expression  profile  data  and  underscore 
the  necessity  of  ascertaining,  by  an  independent  assessment, 
information  on  the  functional  status  of  a  gene  target. 

In  summary,  the  utilization  of  optimized  laboratory  proto¬ 
cols,  monitored  by  stringent  controls,  generates  a  powerful 
data  set  from  the  Affymetrix  Expression  GeneChip  platform. 
The  interpretation  of  the  patterns  and  magnitudes  of  expres¬ 
sion  profiles  represented  in  the  data  set  requires  the  applica¬ 


tion  of  bioinformatics  tools  and  a  fundamental  knowledge  of 
the  model  being  examined.  The  power  of  the  method  resides  in 
the  sensitivity,  accuracy,  and  speed  with  which  the  expression 
of  over  6,000  genes  in  response  to  experimental  conditions  can 
be  simultaneously  assessed.  Confirmation  of  the  trends  ob¬ 
served  in  the  data  generated  by  expression  profiling  serves  as  a 
point  of  departure  for  further  analysis  of  gene  function  and 
thus  of  the  molecular  mechanisms  of  drug  action. 
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Abstract 

The  advent  of  high-throughput  methods  for  the  analysis  of  global  gene  expression,  together  with  the  Malaria  Genome  Project 
open  up  new  opportunities  for  furthering  our  understanding  of  the  fundamental  biology  and  virulence  of  the  malaria  parasite. 
Serial  analysis  of  gene  expression  (SAGE)  is  particularly  well  suited  for  malarial  systems,  as  the  genomes  of  Plasmodium  species 
remain  to  be  fully  annotated.  By  simultaneously  and  quantitatively  analyzing  mRNA  transcript  profiles  from  a  given  cell 
population,  SAGE  allows  for  the  discovery  of  new  genes.  In  this  study,  one  reports  the  successful  application  of  SAGE  in 
Plasmodium  falciparum,  3D7  strain  parasites,  from  which  a  preliminary  library  of  6880  tags  corresponding  to  4146  different  genes 
was  generated-  It  was  demonstrated  that  P.  falciparum  is  amenable  to  this  technique,  despite  the  remarkably  high  A-T  content 
of  its  genome.  SAGE  tags  as  short  as  10  nucleotides  were  sufficient  to  uniquely  identify  parasite  transcripts  from  both  nuclear  and 
mitochondrial  genomes.  Moreover,  the  skewed  A-T  content  of  parasite  sequence  did  not  preclude  the  use  of  enzymes  that  are 
crucial  for  generating  representative  SAGE  libraries.  Finally,  a  few  modifications  to  DNA  extraction  and  cloning  steps  of  the 
SAGE  protocol  proved  useful  for  circumventing  specific  problems  presented  by  A-T  rich  genomes.  ©  2001  Elsevier  Science  B.V. 
All  rights  reserved. 

Keywords:  Plasmodium  falciparum  '.  Malaria;  Serial  analysis  of  gene  expression;  Genomics 


1.  Introduction 

The  malarial  parasite,  Plasmodium  falciparum ,  infects 
approximately  250  million  people  worldwide  and  kills 


Abbreviations:  AE,  anchoring  enzyme;  BLAST,  basic  local  align¬ 
ment  search  tool;  bp,  base  pairs;  DMSO,  dimethylsulfoxide;  LoTE, 
low  Tris-EDTA  solution;  ORF,  open  reading  frame;  PBS,  phosphate 
buffered  saline;  PCI,  phenol:  chloroform  isoamyl  alcohol  in  a  25:24:1 
ratio;  PCR,  polymerase  chain  reaction;  SAGE,  serial  analysis  of  gene 
expression;  SDS-PAGE,  sodium  dodecyl  sulfate  polyacrylamide  gel 
electrophoresis;  pi,  microliter. 
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almost  2  million  of  these  individuals,  mostly  young 
children  in  Africa,  annually  [1],  The  pathogen’s  success 
can  be  largely  attributed  to  its  ability  to  effectively 
evade  host  immunity,  develop  rapid  resistance  to  anti- 
malarial  compounds,  and  complete  a  complex  life  cycle 
in  both  the  human  host  and  mosquito  vector.  With  the 
absence  of  successful  vaccination  and  a  paucity  of 
chemotherapeutic  drugs,  it  is  evident  that  insight  into 
parasite  biology  is  vital  for  developing  knowledge- 
based  strategies  against  a  disease  that  plagues  most  of 
the  developing  world. 

With  this  aim  in  mind,  global  malaria  initiatives  have 
launched  the  Malaria  Genome  Project  [2-4]  in  order  to 
sequence  the  entire  genome  of  the  3D7  strain  of  P. 
falciparum.  By  defining  every  single  gene  in  the  profo- 
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zoan  parasite,  the  Genome  Project  seeks  to  uncover 
virulence  factors  as  well  as  new  targets  for  vaccine  and 
drug  development.  The  genome  spans  approximately 
24.6  Mb,  consists  of  14  chromosomes,  and  is  highly 
A-T  rich  (70-80%  A-T  content).  At  least  80%  of  the 
estimated  7000  genes  are  at  least  partially  sequenced. 
Chromosomes  3  [5]  and  2  [6]  were  completely  se¬ 
quenced  recently;  preliminary  analysis  of  these  predicts 
that  60%  of  coding  regions  in  the  malaria  genome  will 
have  unknown  function.  The  percentage  of  unidentified 
open  reading  frames  (ORFs)  in  the  parasite  is  signifi¬ 
cant  (compared  to  other  genomes)  and  may  reflect 
Plasmodium' s  unique  requirement  for  novel  genes  dur¬ 
ing  host-parasite  interactions,  or  evasion  of  immune 
and  drug  pressure. 

Relating  genomic  sequence  to  function  and  ulti¬ 
mately  malarial  biology  is  the  next  logical  step.  One 
approach  involves  investigating  transcriptional  profiles 
in  Plasmodium  at  the  level  of  the  entire  genome.  In  this 
manner,  complex  processes  involving  the  interaction  of 
multiple  genes,  such  as  stage  specific  differentiation  or 
response  to  drug  pressure,  can  be  dissected.  Such  global 
analysis  has,  in  fact,  been  made  possible  with  the 
development  of  high-throughput  techniques  in  other 
systems;  these  include  differential  display  [7],  micro-ar¬ 
rays  [8],  and  serial  analysis  of  gene  expression  or  SAGE 
[9].  Two  of  these  technologies  have  been  applied  in 
malarial  systems.  Differential  display  in  drug  resistant 
strains  of  P.  falciparum  identified  two  genes  specifically 
induced  under  chloroquine  pressure  [10],  while  differ¬ 
ences  in  mRNA  expression  between  sexual  and  asexual 
blood  stages  were  recently  characterized  by  shot-gun 
micro-arrays  [11].  In  the  present  study,  the  SAGE 
technique  for  P.  falciparum  has  been  developed  and 
optimized. 

Both  SAGE  and  microarrays  are  extremely  powerful 
techniques  with  which  to  characterize  differential  gene 
expression  on  a  global  scale.  ‘Closed’  profiling  plat¬ 
forms  such  as  microarrays  provide  rapid  means  of 
screening  large  numbers  of  experimental  samples;  how¬ 
ever,  the  expression  data  is  limited  to  a  pre-determined 
or  known  set  of  genes  being  screened.  On  the  other 
hand,  ‘open’  platforms  such  as  SAGE  can  identify 
expressed  genes  that  have  not  yet  been  cloned,  genes 
that  are  partially  sequenced,  or  novel  genes  that  cannot 
be  identified  from  sequence  information  alone.  As  such 
SAGE  is  particularly  well  suited  for  Plasmodium  species 
whose  genomes  are  not  completely  annotated.  More¬ 
over,  by  qualitatively  and  quantitatively  analyzing 
thousands  of  transcripts  from  a  given  population  at  the 
same  time,  SAGE  achieves  a  greater  depth  of  coverage 
and  readily  detects  low  abundant  transcripts.  Further¬ 
more  the  technique  may  prove  useful  for  identifying  the 
function  of  many  of  the  unknown  genes  catalogued  by 
the  Genome  project  as  well  as  assigning  ORFs  to 
previously  uncharacterized  genome  sequence  reads. 


SAGE  is  based  on  three  experimentally  confirmed 
principles  [9].  First,  a  short  (lObp)  tag  from  a  defined 
position  within  a  transcript  can  uniquely  identify  a 
gene.  This  is  reasoned  by  the  fact  that  the  maximum 
number  of  possible  tag  sequences,  assuming  a  random 
nucleotide  distribution,  is  far  greater  than  the  number 
of  estimated  genes  in  most  organisms.  Second,  concate¬ 
nation  of  several  tags  into  a  single  molecule  allows  for 
efficient  sequencing  and  acquisition  of  data.  And  third, 
expression  patterns  of  induced  or  repressed  genes  are 
accurately  represented  by  the  abundance  of  their  corre¬ 
sponding  tags. 

A  brief  description  of  the  generation  and  isolation  of 
SAGE  tags  follows  (see  Fig.  1).  A  more  detailed  ac¬ 
count  of  SAGE  can  also  be  obtained  from  Velculescu  et 
al.  [9].  cDNA  from  the  population  of  interest  is  digested 
with  an  anchoring  enzyme  (AE)  that  is  expected  to 
cleave  most  transcripts  at  least  once.  The  AE  defines 
the  position  of  each  SAGE  tag  along  a  transcript. 
Linker  molecules  (40  bp)  are  subsequently  attached  to 
the  digested  cDNA.  These  molecules  contain  recogni¬ 
tion  sites  for  a  type  IIS  restriction  enzyme  that  will  bind 
the  linker  and  cleave  12-16  bp  away  from  its  binding 
site  to  release  a  short  cDNA  tag  (SAGE  tag).  The 
released  molecules  (consisting  of  a  40  bp  linker  and  a 
~  12  bp  cDNA  tag)  are  ligated  to  each  other,  forming 
~  102  bp  structures  containing  ditags  (two  tags  linked 
tail  to  tail).  These  102  bp  fragments  are  amplified  using 
primers  that  bind  to  the  40  bp  linkers.  Purified  102  bp 
DNA  is  cleaved  with  the  AE  to  release  22-24  bp 
ditags,  which  are  ligated  into  long  concatemers  for 
cloning  into  a  plasmid  vector.  The  plasmids  containing 
ditag  concatemers  are  sequenced,  yielding  quantitative 
data  on  abundance  of  each  SAGE  tag.  The  transcript, 
from  which  each  SAGE  tag  was  derived,  is  identified 
through  analysis  of  sequence  databases  using  software 
tools. 

SAGE  has  been  successfully  applied  in  a  number  of 
different  systems.  For  example,  it  has  been  used  to:  (a) 
characterize  the  entire  repertoire  of  expressed  tran¬ 
scripts  in  yeast  [12];  (b)  identify  p53  regulated  genes 
[13,14];  (c)  compare  differential  gene  expression  be¬ 
tween  normal  and  cancerous  human  cells  [15-18];  and 
(d)  profile  gene  expression  in  rice  seedlings  [19].  In 
summary,  SAGE  lends  itself  as  an  extremely  efficient 
tool  for  qualitative  and  quantitative  monitoring  of 
global  gene  expression.  The  high  level  of  accuracy  and 
sensitivity,  as  well  as  the  depth  of  coverage  achieved  by 
SAGE  accounts  for  its  comparative  advantage  over 
other  methods  of  transcript  profiling  (http:// 
www.genzyme.com). 

Here,  the  application  of  this  technique  in  P .  falci¬ 
parum  is  reported.  It  was  possible  to  generate  a  prelim¬ 
inary  SAGE  library  of  6880  tags  from  the  asexual 
blood  stages  of  3D7  strain  parasites.  To  the  authors’ 
knowledge,  this  is  the  first  use  of  SAGE  for  profiling 
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malarial  gene  expression.  More  importantly,  it  was 
demonstrated  that  SAGE  is  feasible  in  an  organism 
whose  genome  is  A-T  rich.  For  instance,  a  sequence  as 
short  as  10  bp  was  found  to  be  sufficient  for  uniquely 
identifying  parasite  genes  encoded  in  both  the  nuclear 
and  mitochondrial  genomes.  Moreover  the  high  A-T 
content  did  not  preclude  the  use  of  restriction  enzymes 
that  have  been  employed  in  other  systems  for  isolating 
tags  and  hence  generating  representative  SAGE  li¬ 
braries.  Finally,  a  few  modifications  to  the  DNA  ex¬ 
traction  and  cloning  steps  of  the  SAGE  protocol 
proved  useful  for  bypassing  specific  problems  presented 
by  the  A-T  richness  of  Plasmodium  sequence. 


2.  Methods 

2.  /.  Primers  and  linkers 

Biotinylated  oligo(dT)20  was  obtained  from  Gibco 


BRL  and  the  remaining  oligonucleotides  were  obtained 
from  Integrated  DNA  Technologies.  All  primers  were 
SDS-PAGE  purified.  SAGE  linker  1  was  formed  by 
hybridizing  oligonucleotide  IB  (5'-TCCCTATTAAGC- 
CT AGTT GTACTGCACC AGC AAAT CC-3 ')  to  oligo¬ 
nucleotide  1A  ( 5  '-TTT GG ATTT GCT GGT GCA  GT A- 
CAACTAGGCTTAATAGGGACATG-30.  SAGE  lin¬ 
ker  2  was  formed  by  hybridizing  oligonucleotide  2B 
(5'-TCCCCGTACATCGTTAGAAGCTTGAATTC- 
GAGCAG-3')  to  oligonucleotide  2A  (S-TTTCTGCT C- 
GAATTCAAGCTTCTAACGATGTACGGGGACATG 
-3').  Oligonucleotide  IB  and  2B  included  a  3'  C7  amino 
modification  and  were  phosphorylated  at  their  5?  end. 
SAGE  linkers  1  and  2  were  self  ligated  and  run  on  a 
12%  polyacrylamide  gel  to  determine  phosphorylation 
efficiency.  Only  linker  pairs  that  self-ligated  to  form 
di-linkers  at  an  efficiency  of  70%  or  greater  were  used 
in  subsequent  steps.  SAGE  linkers  1  and  2  each  contain 
an  overhang  for  the  AE.  Nlalll  (NEB)  which  recog¬ 
nizes  the  sequence  5'  CATG  3'  was  used  as  the  AE. 


/ 
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Figure  1. 
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Fig.  1.  Schematic  illustration  of  the  SAGE  technique.  (1)  3D7  mRNA;  (2)  is  transcribed  into  double  stranded  cDNA  with  a  biotinylated  oligo(dT) 
primer  (black  oval);  (3)  cDNA  is  digested  with  an  AE  and  bound  to  streptavidin  beads  (white  oval)  to  isolate  the  3'  most  AE  site  of  each 
transcript;  (4)  samples  are  divided  and  ligated  to  one  of  two  linkers  (hatched  boxes);  (5)  samples  are  digested  with  a'  typellS  enzyme  to  release 
linkers  attached  to  a  10  bp  SAGE  tag;  (6)  released  tags  are  blunt^ended  and  ligated  together  to  form  a  102  bp  molecule,  which  is  PCR  amplified. 
22m er  ditags  are  isolated  (7)  Ditags  are  ligated  to  form  concatemers,  which  are  cloned  and  sequenced. 
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Linkers  1  and  2  also  contain  sequences  recognized  by 
a  type  IIS  restriction  enzyme  (called  the  tagging  en¬ 
zyme)  and  a  priming  site  for  PCR.  BsmFI  (NEB) 
which  recognizes  the  sequence  5'  GGGAC  3'  served  as 
the  tagging  enzyme.  PCR  primer  sequences  for  linker 
1  and  2  were  5'-GGATTTGCTGGTGCAGTACA-3' 
and  S'-CTGCTCGAATTCAAGCTTCT-S',  respec¬ 
tively. 


of  biotinylated  oligo(dT)20.  The  efficiency  of  oligo(dT) 
biotinylation  was  previously  verified  by  measuring  per¬ 
centage  binding  to  streptavidin  beads  (Dynal).  The 
quality  of  double  stranded  cDNA  was  checked  by 
agarose  gel  electrophoresis. 


2.4.  Definition  and  isolation  of  cDNA  tags 


2.2.  Parasite  cultures 

P.  falciparum  clone  3D7  (kindly  provided  by  Dr 
Dan  Carucci,  Naval  Medical  Research  Center)  was 
maintained  in  continuous  culture  as  described  by 
Trager  and  Jensen  [20],  with  some  modifications. 
Briefly,  cultures  were  grown  in  RPMI  (supplemented 
with  0.5%  Albumax  I,  24  mM  sodium  bicarbonate,  1 1 
mM  glucose,  12  mM  TES  sodium  salt,  1  mM  pyru¬ 
vate,  2  mM  glutamine,  0.04  mM  hypoxanthine, 
0.0005%  Gentamycin)  at  a  5%  hematocrit  in  a  5% 
carbon  dioxide,  1%  oxygen  and  balanced  nitrogen  en¬ 
vironment.  Cultures  were  placed  on  a  shaking  plat¬ 
form  to  minimize  multiple  invasion  of  red  blood  cells 
(rbcs)  by  merozoites.  Media  and  gas  were  replaced 
every  day  and  the  percentage  of  parasitized  rbcs  (para¬ 
sitemia)  was  determined  by  thin  blood  smears.  Cul¬ 
tures  at  parasitemias  of  approximately  12%,  where  the 
majority  consisted  of  trophozoite  forms  (8%  tropho¬ 
zoites,  2%  rings,  2%  schizonts),  were  harvested  for 
isolation  of  total  RNA. 

2.3.  RNA  extraction  and  cDNA  synthesis 

Total  RNA  was  extracted  immediately  using  Tri¬ 
reagent  BD  for  blood  products  (Molecular  Research 
Center),  and  selected  twice  on  oligo(dT)  cellulose  using 
the  Message  Maker  reagent  assembly  (Gibco,  BRL)  to 
)Q  enrich  for  mRNA.  A  total  of  1010  parasites  typically' 
yield  approximately  20  pg  of  mRNA.  The  integrity  of 
mRNA  samples  was  checked  by  gel  electrophoresis, 
RT-PCR  and  northern  blot  analysis  using  probes  for 
'jjpJiCS*  PfMDRl  and  calmodulin  (data  not  shown). 

In  separate  experiments,  total  RNA  was  extracted 
with  either  Tri-reagent  (Molecular  Research  Center), 
or  with  Tri-reagent  following  saponin  lysis  (1%  sa¬ 
ponin  in  PBS)  of  parasitized  rbcs,  which  serves  to  lyse 
rbcs  while  leaving  parasites  intact.  mRNA  was  purified 
from  total  RNA  with  the  Oligotex  mRNA  kit  (Qia- 
gen).  Each  of  the  different  methods  outlined  above 
yielded  input  mRNA  of  a  high  quality,  which  was 
successfully  used  to  generate  template  for  amplification 
of  ditags  (see  subsequent  sections). 

cDNA  was  synthesized  from  5  pg  of  mRNA  with 
the  cDNA  synthesis  system  (Gibco,  BRL)  following 
the  manufacturer's  recommendations  for  protocol  1. 
First  strand  cDNA  synthesis  was  primed  with  2.5  pg 


The  entire  cDNA  sample  was  digested  with  100  U 
of  the  AE,  Nlalll  (New  England  Biolabs-NEB)  for  2 
h  at  37°C,  in  two  reaction  volumes  of  200  pi  each. 
The  digest  was  extracted  with  an  equal  volume  of 
phenol/chloroform/isoamyl  alcohol  (Sigma)  or  PCI 
(25:24:1),  precipitated  with  ethanol  [200  pi  sample,  3 
pi  glycogen  (Boehringer  Mannheim),  100  pi  10  M  am¬ 
monium  acetate  (Sigma),  900  pi  ethanol]  on  dry  ice  for 
10  min,  and  centrifuged  at  13  000  rpm  for  40  min  at 
4°C.  The  pellet  was  washed  once  with  70%  ethanol 
and  resuspended  in  20  pi  of  LoTE  (3  mM  Tris-HCl 
pH  7.5,  0.2  mM  EDTA  pH7.5-stock  solutions  from 
Gibco-BRL). 

The  3'  ends  of  cDNA  molecules  were  isolated 
through  the  binding  of  biotinylated  oligo(dT)  to  para¬ 
magnetic  streptavidin  beads.  This  process  exposes  a 
unique  site  on  each  transcript  corresponding  to  its  3' 
most  AE  site.  Briefly,  the  cDNA  sample  was  divided 
into  two  fractions  (10  pi  each).  Each  fraction  was 
incubated  with  1  mg  of  beads  [previously  washed  with 
binding/wash  solution  (5  mM  Tris,  0.5  M  EDTA,  1  M 
NaCl)]  in  200  pi  of  binding/wash  solution  for  30  min 
at  room  temperature.  The  bead-bound  cDNA  samples 
were  washed  twice  with  200  pi  of  binding/wash  solu¬ 
tion  and  once  with  200  pi  of  LoTE. 

Each  fraction  was  then  ligated  to  either  linker  1  or 
2  via  the  AE  overhang.  Briefly  2  pg  of  either  linker 
were  incubated  with  bead-bound  cDNA  in  a  40  pi 
reaction  volume  at  50°C  for  2  min,  followed  by  a  15 
min  incubation  at  room  temperature.  Ten  units  of  T4 
DNA  ligase  together  with  its  supplied  buffer  (Gibco, 
BRL)  were  added  to  the  reaction  and  placed  at  16°C 
for  2  h.  Next  the  bead-bound  cDNA-linker  samples 
were  washed  four  times  with  100  pi  of  binding/wash 
solution,  transferred  to  a  new  1.5  ml  tube,  and  washed 
once  with  100  pi  of  binding/wash  solution  and  100  pi 
of  1  x  NEB  buffer  4. 

Short  SAGE  tags  were  released  from  cDNA 
molecules  by  incubation  with  the  tagging  enzyme, 
BsmFI.  Briefly  the  bead-bound  cDNA-linker  sample 
was  incubated  with  2  U  of  BsmFI  (NEB)  for  1.5  h  at 
65°C,  in  a  100  pi  reaction  volume.  This  served  to 
release  a  10  bp  fragment  of  cDNA  attached  to  a  40  bp 
linker  molecule.  The  supernatant  was  then  transferred 
to  a  fresh  1.5  ml  tube,  PCI  extracted,  ethanol  precipi¬ 
tated,  washed  twice,  and  resuspended  in  1 1  pi  of 
LoTE. 
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2.5.  Generation  and  PCR  amplification  of  102  bp  ditag  multiple  PCR  reactions  (288  or  480)  were  set  up  as 
molecules  described  above. 

The  102  bp  product  was  gel-purified  in  the  following 

Released  tags  were  incubated  with  the  Klenow  frag-  manner.  All  PCR  reactions  were  pooled,  PCI  ex- 

ment  of  DNA  Polymerase  I  to  produce  blunt-ended  tracted,  ethanol  precipitated  and  resuspended  in  360  pi 

products,  and  then  ligated  to  each  other  to  form  a  of  LoTE.  The  material  was  run  on  three  12%  poly- 

ditag-containing  molecule.  Briefly,  each  fraction  of  re-  acrylamide  gels  and  stained  with  ethidium  bromide, 

leased  tags  was  blunt-ended  by  incubation  with  5  U  of  Gels  were  shielded  with  plexi-glass  when  visualizing 

the  Klenow  fragment  of  DNA  Polymerase  I  (NEB)  for  bands  with  300  nm  UV  trans-illumination.  The  102  bp 

30  min  at  25°C  in  a  50  pi  reaction  volume  containing  band  was  excised  across  the  width  of  each  gel  and 

1  x  second  strand  buffer  (a  component  of  the  cDNA  divided  into  three  slices.  Each  gel  slice  was  fragmented 

synthesis  system;  Gibco,  BRL),  and  0.025  mM  each  by  spinning  though  a  0.5  ml  tube,  which  was  pierced 

dNTP.  The  samples  were  PCI  extracted,  ethanol  pre-  with  an  18  gauge  needle.  The  sample  was  collected  in 

cipitated,  and  washed  as  above.  Pellets  were  resus-  a  1.5  ml  tube.  DNA  was  eluted  from  the  gels  by 

pended  in  6  pi  of  LoTE.  placing  the  gel  in  300  pi  LoTE,  and  100  pi  10  M 

A  small  aliquot  of  released  tags  were  radiolabeled  to  ammonium  acetate  at  4°C  overnight.  The  samples 

assess  the  quality  of  manipulations  up  to  this  point.  were  heated  at  37°C  for  2  h,  65°C  for  15  min,  and 

Briefly,  1  pi  of  released  tags  was  incubated  with  1  x  gradually  cooled  to  room  temperature.  Polyacrylamide 

second  strand  buffer  (cDNA  synthesis  system;  Gibco,  was  removed  on  SpinX  columns  (Costar)  and  the 

BRL),  0.03  mM  each  of  dCTP,  dGTP,  dTTP,  2.5  U  DNA  was  PCI  extracted,  precipitated  and  resuspended 

of  the  Klenow  fragment  of  DNA  Polymerase  I  and  1  in  a  total  volume  of  126  pi  LoTE. 

?  _  _  pi  of  (32P  dATP  as  above.  The  reaction  was  run  on  a  In  separate  experiments,  SYBR  green  I  stain 

,  polyacrylamide  gel  and  exposed  wet  to  autoradio-  (Molecular  Probes)  was  used  for  detecting  DNA  in 

graphic  film  for  20  min  at  —  80°C.  polyacrylamide  gels.  SYBR  green  is  25  times  more 

The  two  fractions  of  blunt-ended  tags  were  ligated  sensitive  than  ethidium  bromide,  resulting  in  intense 

to  each  other,  forming  a  molecule  containing  a  ditag.  background  staining  and  smearing.  This  made  it 

Briefly,  2  pi  from  both  fractions  were  incubated  to-  difficult  to  cleanly  isolate  the  102  bp  ditag  bands  from 

gether  at  16°C  for  16  h,  in  a  6  pi  reaction  volume  the  80  bp  di-linkers  (Fig.  4).  Contamination  of  the 

containing  4  U  of  T4  DNA  ligase  and  the  supplied  102mer  reduces  the  yield  and  average  size  of  tag  con- 

buffer  (Gibco,  BRL).  Two  sets  of  ligation  reactions  catemers  at  subsequent  steps  [21]. 

were  set  up.  A  control  that  lacked  ligase  was  also 
included.  The  ligation  reactions  were  PCI  extracted, 

ethanol  precipitated  and  resuspended  in  30  pi  of  2.6.  Isolation  of  22  bp  ditags  and  concatenation 
LoTE. 

The  ligated  products  were  then  amplified  by  PCR  to  Digesting  the  102  bp  product  with  the  AE  resulted 

generate  sufficient  material  from  which  ditags  could  be  in  the  release  of  22  bp  ditags.  Briefly,  the  102  bp  gel 

isolated.  Briefly,  these  samples  (including  the  ligase  purified  fragment  was  incubated  with  240  U  of  Nlalll 

minus  control)  were  diluted  20-fold  for  use  as  PCR  (in  two  reaction  volume  of  1 50  pi  each)  at  37°C  for  2 

template.  PCR  reactions  (50  pi  reaction  composed  of  h.  The  reaction  was  PCI  extracted,  precipitated,  resus- 

16.6  mM  ammonium  sulfate,  67  mM"  Tris  pH  8.8,  6.7  pended  in  32  pi  of  LoTE,  and  loaded  on  three  lanes  of 

mM  magnesium  chloride,  10  mM  p-mefcaptoethanol,  a  12%  polyacrylamide  gel.  The  ditags  running  at  22- 

6%  DMSO,  0.375  mM  each  dNTP,  350  ng  of  each  26  bp  were  excised  and  eluted  as  above,  except  that 

SAGE  primer,  5  U  Taq  polymerase  (Perkin-Elmer)  the  heating  step  at  65°C  was  omitted.  The  sample  was 
and  1  pi  of  template)  were  set  up  in  Hot  Start  tubes  resuspended  in  7  pi  of  LoTE. 

(Gibco,  BRL).  Amplification  was  carried  out  for  27  Ditags  were  concatenated  into  single  molecules,  with 

cycles  of  30  s  at  95°C,  1  min  at  55°C,  and  1  min  at  5  U  of  T4  DNA  ligase  in  a  total  volume  of  10  pi,  at 
72°C,  with  initial  heat  activation  for  1  min  at  95°C  16°C  for  16  h.  Concatenation  of  tags  allows  for  effi- 

and  final  extension  for  5  min  at  72°C.  An  aliquot  of  cient  sequencing  of  multiple  tags  from  a  single  clone, 

the  PCR  reaction  was  resolved  on  a  12%  polyacry-  The  concatemer  sample  was  heated  at  65°C  for  15 

lamide  gel  to  check  for  the  presence  of  a  102  bp  min,  chilled  on  ice  for  10  min  and  loaded  on  one  lane 

product  (consisting  of  a  22  bp  ditag  flanked  on  either  of  an  8%  polyacrylamide  gel.  Concatemers  resolved  as 

end  by  40  bp  linker  sequence),  expected  in  ligase  plus  a  smear  on  the  gel.  Three  size  fractions  (100-400; 

samples  only.  An  80  bp  product  is  also  generated  400-800;  and  >  800  bp)  were  excised  and  gel-purifi- 

during  PCR  (two  40  bp  linker  molecules  ligated  to-  ed  as  for  the  102mer.  Each  size  fraction  of  gel-pur- 

gether  to  form  a  di-linker).  To  generate  sufficient  102  ified  concatemer  sample  was  resuspended  in  6  pi  of 

bp  product  for  22  bp  ditag  isolation  (see  next  section)  LoTE. 
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2. 7.  Cloning  and  sequence  analysis 

All  manipulations  of  the  pZero-1  plasmid  (Invitro- 
gen)  were  performed  using  solutions  provided  by  the 
company  to  limit  the  introduction  of  endonucleases. 
Two  pg  of  pZero-1  were  incubated  with  7  U  SphI  at 
37°C  for  1.5  h.  The  digest  was  extracted,  precipitated 
and  resuspended  in  60  pi  LoTE.  The  vector  sample  was 
diluted  5-  and  10-fold,  and  1  pi  from  each  dilution  was 
ligated  separately  to  3  pi  of  concatemer  insert  (in  a 
reaction  volume  of  5  pi)  at  16°C  for  16  h.  Ligation 
reactions  with  vector  alone  were  set  up  as  negative 
controls.  A  total  of  5  U  of  T4  DNA  ligase  were  used 
per  reaction.  The  samples  were  subsequently  PCI  ex¬ 
tracted,  ethanol  precipitated,  washed  four  times  with 
70%  ethanol  and  resuspended  in  6  pi  of  LoTE. 

Each  sample  (1  pi)  was  electroporated  into  Electro- 
Max  DH10B  cells  (Gibco,  BRL)  and  plated  out  on  low 
salt  LB  plates  containing  IPTG,  zeocin,  and  X-gal. 
Colonies  were  screened  directly  for  inserts  by  PCR, 
utilizing  the  Ml 3  forward  and  reverse  sequences  flank¬ 
ing  the  cloning  site  as  primers.  PCR  amplification  was 
carried  out  as  before  (see  Section  2.5),  except  that  60  ng 
of  each  Ml 3  primer  and  1  U  of  Taq  polymerase  were 
used  in  each  reaction.  An  alternate  and  more  rapid 
method  of  screening  involved  lysing  bacteria  in  50  pi  of 
buffer  composed  of  50  mM  sodium  hydroxide,  0.5% 
SDS,  5  mM  EDTA,  and  0.025%  Bromocresol  green  at 
65°C  for  45  min.  Samples  were  mixed  with  1  pi  of  30% 
glycerol  and  run  on  1%  agarose  gel  to  determine  plas¬ 
mid  size.  Selected  clones  were  grown  in  96  well  plate 
format  on  a  shaking  platform  at  37°C  for  24  h.  A  total 
of  50%  glycerol  stocks  of  bacterial  clones  were  pre¬ 
pared.  Automated  sequencing  of  these  was  performed 
with  dye  terminator  chemistry  at  The  Institute  of  Ge¬ 
nomic  Research  (Rockville,  MD)  and  The  Walter  Reed 
Army  Institute  of  Research  (Silver  Spring,  MD)  using 
Ml 3  forward  and  reverse  primers. 

Sequence  data  was  analyzed  using  SAGE  software 
(Genzyme),  which  identifies  cDNA  ditag  sequence 
flanked  by  AE  sites  in  order  to  extract  14  bp  tag  counts 
(4  bp  NlalU  site  and  10  bp  tag  sequence),  as  well  as 
compares  experimental  tag  data  to  Genbank  sequence 
databases.  Basically  the  software  created  a  database  of 
all  potential  14  bp  NlalU  tags  from  Plasmodium  se¬ 
quences  registered  by  the  Malaria  Genome  Consortium 
in  the  NCBI  Malaria  Genetic  and  Genomics  website 
(www.ncbi.nih.gov/Malaria/)  and  linked  each  tag  to 
gene  annotations  in  the  NCBI  database  (as  of  February 
21,  2000).  The  experimental  SAGE  library  was  com¬ 
pared  against  this  data  set  When  tags  matched  to 
sequence  reads  that  have  not  yet  been  annotated,  a 
500-1000  bp  sequence  surrounding  the  tag  was  trans¬ 
lated  in  all  reading  frames  and  compared  to  the  entire 
NCBI  protein  database  using  BLASTx. 


Tags  that  did  not  match  the  P.  falciparum  NCBI 
database  were  searched  against  a  composite  assembly 
of  all  available  genomic  P.  falciparum  sequences  (as  of 
February  2000),  kindly  provided  by  Drs  Jessica 
Kissinger  and  David  Roos  (University  of  Pennsylva¬ 
nia).  This  database  contains  the  most  complete  and 
up-to-date  genome  sequence  from  P.  falciparum ,  but  is 
not  annotated.  Hence  tags  that  gave  matches  were 
further  characterized  by  the  BLASTx  function  de¬ 
scribed  above.  Any  14  bp  tags  failing  to  match  either 
database  were  analyzed  as  before  using  only  the  first  1 3 
bp  of  tag  sequence.  The  length  of  an  actual  tag  can 
vary  between  12  and  16  bp  since  BsmFI  does  not  cut 
exactly  10  bp  away  from  its  recognition  site.  Other 
SAGE  studies  have  used  13  bp  tags  in  their  analyses 
[9,19]. 


3.  Results  and  discussion 

In  this  report,  the  feasibility  of  the  SAGE  methodol¬ 
ogy  as  applied  to  asexual  stages  of  P.  falciparum  is 
demonstrated.  It  was  possible  to  successfully  generate 
ditags  from  parasite  cDNA  and  construct  a  SAGE 
library  consisting  of  approximately  6880  tags.  Further¬ 
more,  it  was  found  that  the  AE  formerly  used  in  other 
systems  (with  balanced  nucleotide  distributions),  could 
effectively  define  and  isolate  tags  despite  the  highly  rich 
A-T  composition  of  parasite  sequence.  In  spite  of  the 
lower  complexity  of  P.  falciparum  DNA,  tags  as  short 
as  10  bp  were  sufficient  to  uniquely  identify  parasite 
transcripts  from  both  nuclear  and  mitochondrial 
genomes.  This  A-T  richness  may  have,  however,  con¬ 
tributed  to  decreased  ditag  yields.  Since  SAGE  is  a 
multi-step  process,  reduced  efficiency  of  any  one  single 
step  will  impact  all  those  downstream.  For  example 
lowered  amounts  of  22  bp  ditag  translated  into  reduced 
amounts  of  concatemer  insert,  which  in  turn  affected 
cloning  efficiency.  To  help  overcome  these  problems  a 
few  modifications  were  applied  to  DNA  extraction  and 
cloning  steps  of  the  established  SAGE  protocol. 

3. 7.  Occurrence  of  AE  sites 

Despite  the  A-T  richness  of  malarial  sequence,  it 
was  found  that  the  occurrence  of  NlalU  sites  (5'  CATG 
3')  is  relatively  frequent  in  parasite  DNA;  hence  NlalU 
was  chosen  as  the  AE  in  this  system  as  in  others.  As 
mentioned  earlier,  the  AE  defines  the  position  of  each 
tag  within  a  transcript  and  hence  should  cleave  most 
mRNA  molecules  at  least  once,  in  order  to  generate 
truly  representative  SAGE  libraries.  In  yeast  and  mam¬ 
malian  genomes,  NlalU  cleaves  every  256  bp,  while 
most  transcripts  are  much  larger.  The  frequency  of 
NlalU  sites  in  P.  falciparum  cDNA  is  lower,  around 
once  every  400  bp  as  calculated  from  the  occurrence  of 
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Lanes:  12  3  4 


Fig.  2.  Size  distribution  of  N l alii -digested  cDNA.  Double  stranded 
cDNA  from  P.  falciparum  was  digested  with  Nlalll  and  elec- 
trophoresed  on  a  1%  agarose  gel  (lane  3).  Undigested  cDNA  was 
resolved  in  lane  2.  Lambda  DNA-Hind  III  digest  (NEB)  (lane  1)  and 
pBR322  MspI  digest  (NEB)  flane  4)  were  used  as  the  markers.  DNA 
was  stained  with  ethidium  bromide. 

CATG  in  chromosome  2  and  3.  After  Nlalll  digestion, 
the  size  distribution  of  parasite  cDNA  collapses  from 
between  0.1  and  >9.4  kb  to  between  0.1  and  2.3  kb 
(see  Fig.  2).  Since  gene  density  is  estimated  at  one  gene 
every  4.5  kb,  Nlalll  is  still  expected  to  cleave  most 
parasite  transcripts.  Completion  of  the  Genome  Project 
will  reveal  those  genes  which  lack  Nlalll  sites  alto¬ 
gether.  Another  issue  relating  to  the  AE  is  the  creation 
of  Nlalll  sites  by  exon  splicing.  Such  sites  will  be 
missed  in  the  analysis,  since  SAGE  tags  are  searched 
against  genomic  sequence  of  Plasmodium.  EST  data- 
w  \\C^  bases  and  cDNA  sequence  will  improve  the  analysis  of 
y"  — Plasmodium  SAGE  tags. 

Alternatively,  enzymes  whose  restriction  sites  occur 
more  frequently  in  parasite  DNA  could  serve  as  the 
AE.  However,  both  Ndel  and  VspI  were  tested  as  the 
anchoring  enzyme  in  parallel  experiments  (enzymes  that 
jib  recognize  the  A-T  rich  sequences  5'-CATATG-3'  and 
jV-  —  5'-ATTAAT-3',  respectively),  and^discovered  that  these 
were  not  appropriate.  In  both  experiments,  sufficient 
102mer  was  produced,  but  detectable  amounts  of  22  bp 
ditag  were  not  released  upon  digestion  with  either 
enzyme.  To  understand  the  basis  of  this  result,  the  102 
bp  fragments  were  cloned  into  T-A  vectors  and  the 
presence  of  ditags  subsequently  checked  by  sequence 
analysis.  Approximately  half  of  the  102  bp  fragments 
— •  contained  bona  fide  ditags,  while  half  consisted  of  two 


linker  molecules  each  flanking  22-28  bp  stretches  of  As 
or  Ts  that  lacked  the  AE  site.  It  was  postulated  that 
these  aberrant  102mers  may  represent  oligo(dT)  that 
was  carried  over  from  the  first  strand  cDNA  synthesis 
reaction,  and  subsequently  annealed  to  the  short  3' -5' 
A-T  overhangs  present  on  linkers  designed  to  ligate  to 
Ndel-  or  Vspl-digested  cDNA.  From  these  data,  both 
the  frequency  of  restriction  sites  as  well  as  the  overhang 
need  to  taken  into  account,  and  the  effectiveness  of  new 
anchoring  enzymes  will  have  to  be  determined 
empirically. 

3.2.  Generation  of  102  bp  PCR  product  and  22  bp 
ditags 

A  key  step  in  the  SAGE  methodology  is  the  genera¬ 
tion  of  102  bp  PCR  product  since  its  yield  determines 
the  amount  of  22  bp  ditag  recovered.  An  increased 
yield  of  22  bp  ditag  in  turn  enhances  cloning  and 
sequencing  efficiency  downstream.  It  was  demonstrated 
that  sufficient  amounts  of  102  bp  product  can  be  gener¬ 
ated  from  parasite  cDNA. 

To  monitor  the  quality  of  manipulations  leading  up 
to  102  bp  formation,  cDNA  tags  released  by  BsmFI 

Lanes:  1  2  3 


50bp 


25bp 


Fig.  3.  Blunt-ended  cDNA  tags.  Released  cDNA  tags  from  either  one 
of  two  fractions  (one  fraction  was  ligated  to  linker  1  and  the  other  to 
linker  2)  were  incubated  with  the  Klenow  fragment  of  DNA  poly¬ 
merase  I  and  radiolabeled  dATP  to  produce  blunt-ended  molecules. 
The  blunt-ended  tags  from  fraction  1  (lane  i)  and  fraction  2-  (lane  2) 
are  visible  at  ~  50  bp  (40  bp  linker  +  10  bp  cDNA  tag)  (solid  arrow). 
A  25  bp  ladder  (Gibco)  was  used  as  the  marker  (lane  3).  Samples 
were  run  on  a  12%  polyacrylamide  gel  and  visualized  by  autoradiog¬ 
raphy. 
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Fig.  4.  PCR  amplification  of  ditags.  Blunt-ended  cDNA  tags  (at¬ 
tached  to  linkers)  were  ligated  to  each  other  with  T4  DNA  ligase  and 
this  ligation  sample  was  used  as  template  for  21  cycles  of  PCR  (lane 
1).  cDNA  tags  incubated  without  T4  DNA  ligase  were  also  PCR 
amplified  as  negative  controls  (lane  2).  The  product  at  ~  100  bp  in 
lane  1  corresponds  to  the  amplified  ditag  containing  molecule  (see 
solid  arrow).  The  bands  at  ~80  bp  (lanesl)  correspond  to  di-linker 
formed  by  ligation  of  contaminating  40  bp  linker  molecules.  A  25  bp 
ladder  (Gibco)  was  used  as  a  marker  (lane  3).  DNA  was  run  on  a 
12%  polyacrylamide  gel  and  stained  with  ethidium  bromide. 


were  radiolabeled  (Fig.  3).  The  expected  band  at  ap¬ 
proximately  50  bp  (40  bp  linker  attached  to  a  10  bp 
cDNA  tag)  is  clearly  visible  in  both  fraction  1  and  2. 
Upon  ligation  of  blunt-ended  tags  and  subsequent  PCR 
amplification,  the  expected  products  at  102  bp  were 
obtained  (Fig.  4).  Bands  at  80  bp,  corresponding  to 
contaminating  di-linker  molecules,  were  also  present. 
Vogelstein  and  coworkers  [22]  report  a  typical  yield  of 
10-20  jig  of  102  bp  product  after  gel  purification  from 
96  PCR  reactions  whereas  one  obtained  10-20  pg  of 
purified  102  bp  product  from  288  PCR  reactions.  This 
lower  yield  may  be  related  to  the  high  A-T  content  of 
Plasmodium  sequence.  For  example,  lower  AE  fre¬ 
quency  (as  described  earlier)  could  result  in  fewer  sites 
for  generating  SAGE  tags,  and  hence  reduced  amounts 
of  template  for  PCR  amplification  of  the  102  bp  band. 
In  an  attempt  to  optimize  PCR,  the  concentration,  of 
dNTP  was  varied  (between  0.0075  and- 1.5  mM  of  each 
\jJ&  dNTP),  .^md)  the  ratio  of  dATP/dTTP  to  dGTP/dCTP 

(1-5  mM  dATP/dTTP:  1.5  mM  dCTP/dGTP;  3  mM:l 


mM;  and  0.3  mM:0.1  mM).  Varying  the  relative  ratios 
of  dNTPs  in  this  manner  was  previously  shown  to 
improve  PCR  amplification  of  A-T  rich  mitochondrial 
sequences  [23].  Platinum  Taq  polymerase  was  also 
tested  (Gibco,  BRL).  These  modifications  did  not  im¬ 
prove  PCR  amplification  of  102  bp  product;  instead  it 
was  observed  that  dNTP  concentrations  above  0.375 
mM  each  were  inhibitory  to  PCR.  Hence,  in  some 
experiments,  20  pg  of  102  bp  product  was  obtained  by 
increasing  the  number  of  scale-up  PCR  reactions  by 
approximately  2-fold.  ,  O' 

Nlalll  digestion  of  thejl02  bp  molecule  released  22  bp 
ditags  as  expected  (Fig.  5).  The  quantity  and  quality  of 
these  ditags  are  crucial  for  determining  downstream 
concatemerization  and  cloning  efficiency.  Other  studies 
report  ditag  yields  of  several  hundred  nanograms  (500- 
1000  ng)  [21].  One  has  consistently  obtained  100-300 
ng  of  22  bp  ditags,  despite  increases  in  PCR  scale  up. 
This  indicates  that,  for  SAGE  in  P.  falciparum ,  the 
same  amount  of  102  bp  product  yields  significantly 
lower  quantities  of  22  bp  ditags  compared  to  other 
systems.  This  reduced  yield  may  be  due  to  the  fact  that 
A-T  rich  SAGE  tags  are  predicted  to  have  lower 

Lanes: 


100bp  — 


50bp  — 


C=3 


25bp  — 


Fig.  5.  Isolation  of  22  bp  ditags.  The  102  bp  PCR  amplified  product 
was  gel  purified  and  digested  with  Nlalll  to  cleave  off  40  bp  linkers 
from  both  ends  (open  arrow,  lane  3)  and  to  release  the  22  bp  ditag 
(solid  arrow,  lane  3).  The  band  at  -  75  bp  corresponds  to  partially 
digested  102  bp  products.  Undigested  gel  purified  102  bp  product  is 
shown  in  lane  2.  A  25  bp  ladder  (Gibco)  was  used  as  a  marker  (lane 
1).  DNA  was  run  on  a  12%  polyacrylamide  gel  and  stained  with 
ethidium  bromide. 
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Table  1 

Summary  of  3D7  SAGE  library  analysis 


3D7  parasite 
population 

Total  number  of  clones 

575 

Total  number  of  tags 

6880 

Total  number  of  tags  after  excluding 

6702 

linker-derived  tags 

Average  number  of  tags /clone 

12 

Percentage  of  duplicate  ditags8 

3 

Total  number  of  genes 

4146 

*  Duplicate  ditags  include  ditag  sequences  that  are  observed  more 
than  once.  The  percentage  of  duplicate  ditags  is  calculated  by  multi¬ 
plying  the  number  of  duplicate  ditags  by  2  and  dividing  by  the  total 
number  of  tags. 

melting  temperatures,  which  in  turn  might  result  in  the 
loss  of  22  bp  ditags  during  gel  electrophoresis  and 
extraction.  Addition  of  salt,  gradual  cooling  of  the 
DNA  sample  to  room  temperature,  and  minimization 
of  UV  exposure  during  extraction  of  both  the  102mer 
and  22  bp  ditags  marginally  improved  the  yield  as 
assessed  by  cloning  efficiency,  since  the  percentage  of 
clones  containing  concatemer  inserts  increased  by  10% 
(see  next  section). 

The  second  parameter  is  the  purity  of  ditags  in  the 
concatemer  reaction.  After  PCR  amplification  and 
Nlalll  digestion  of  the  102  bp  PCR  product,  the  result¬ 
ing  products  contained  large  amounts  of  80  bp  di-linker 
and  40  bp  linker  respectively.  Although  the  102  and  22 
bp  ditags  are  gel-purified,  excess  linker  material  could 
run  aberrantly  on  the  gel  and  serve  to  poison  the 
concatemerization  reaction,  by  preventing  the  extension 
and  cloning  of  concatemers.  Excessive  contamination 
with  80  bp  di-linkers  was  ruled  out  by  gel  analysis  of  a 
small  aliquot  of  the  purified  102  bp  product  (Fig.  5), 
which  showed  by  ethidium  bromide  staining  that  the 
sample  contained  <1%  contamination.  Surprisingly, 
upon  digestion  of  the  102  bp  fragment  (>99%  pure), 
the  relative  ratio  of  22  bp  ditag  ta  40  bp  linkers  was 
approximately  1:8  in  the  present  study  while  this  ratio 
should  be  approximately  1:2.  Hence  stoichiometric 
amounts  of  ditag  insert  were  not  recovered  in  the 
present  study.  Perhaps,  due  to  their  high  A-T  content, 
the  22  bp  molecules  dissociate  and  are  lost  during  gel 
electrophoresis. 

3.3.  Cloning 

Ditags  were  successfully  concatenated  and  cloned 
into  the  pZero  vector.  Each  clone  contained  an  average 
of  12  tags  (see  Table  1),  a  number  similar  to  those 
obtained  by  several  other  investigators  [24,25].  How¬ 
ever,  cloning  efficiency  was  compromised  by  lower  di¬ 
tag  yields  in  preceding  steps.  Several  modifications  were 
applied  to  the  cloning  protocol  accordingly. 


Upon  transformation  of  Escherichia  coli  with  ligated 
plasmids,  pZero  utilizes  its  ccdB-LacZ  fusion  gene  as  a 
lethal  selection  against  colonies  containing  no  insert 
(Invitrogen);  however,  in  the  authors5  hands,  this  selec¬ 
tion  was  leaky.  Hence,  decreased  insert  concentrations 
resulted  in  an  extremely  low  frequency  of  insert-con¬ 
taining  colonies  (4%).  This  problem  was  compensated 
for  by  reducing  the  concentration  of  pZero  in  the 
ligation  by  5-  and  10-fold.  This  served  to  increase  the 
frequency  of  clones  with  inserts.  One  routinely  obtains 
frequencies  between  30  and  47%,  with  the  higher  fre¬ 
quencies  relating  to  the  use  of  high  salt  elution  during 
preparation  of  102  and  22  bp  molecules. 

In  order  to  bypass  the  need  for  screening  clones,  the 
transformations  was  plated  on  X-gal  plates,  exploiting 
the  LacZ  marker  in  a  double  selection.  Of  the  clones 
grown  in  the  absence  of  X-gal,  31%  were  positive  for 
insert;  on  the  other  hand,  84%  of  all  white  colonies 
selected  from  X-gal  plates  contained  insert.  Double 
selection  of  clones  may  prove  useful  when  establishing 
SAGE  libraries  in  systems  with  limited  amounts  of 
RNA  [25-27]. 

3.4.  Sequence  analysis 

A  SAGE  library  of  approximately  6880  tags  has  been 
generated  from  3D7  strain  parasites.  This  library  is 
currently  being  expanded  to  provide  a  more  compre¬ 
hensive  expression  profile,  which  will  be  presented  in 
Patankar  et  al.  (in  preparation).  A  preliminary  analysis 
of  sequence  data  from  the  current  library  is  presented 
in  Table  1.  A  total  of  4146  different  genes  were  repre¬ 
sented  in  the  3D7  library.  Of  these,  1047  genes  were 
represented  by  tags  at  an  abundance  level  of  2  or 
greater  (some  tags  at  an  abundance  of  1  may  be  pro¬ 
duced  by  sequencing  error).  The  percentage  of  linker 
derived  tags,  i.e:  tags  corresponding  to  linker  sequence, 
as  well  as  the  percentage  of  duplicate  ditags  (repeated 
ditags)  were  both  only  3%.  The  percentage  of  duplicate 
ditags  provides  a  measure  of  biased  PCR.  In  other 
SAGE  Studies,  this  percentage  ranges  from  between  4% 
to  as  much  as  10%  (http://www.sagenet.org). 

To  determine  whether  10  bp  SAGE  tags  could 
uniquely  identify  parasite  genes,  blast  analysis  of  the 
256  most  abundant  tags  (tags  at  abundance  level  of  4  or 
greater)  were  conducted:  66%  of  these  tags  matched  to 
unique  sites  in  the  Plasmodium  genome,  20%  of  the 
tags  did  not  match  registered  sequence  in  either  data¬ 
base,  and  14%  matched  to  more  than  one  locus.  In 
other  systems,  there  have  also  been  several  instances 
where  two  or  more  genes  share  the  same  tag,  i.e.  some 
SAGE  tags  match  to  more  than  one  locus  in  the 
sequence  database.  Northern  blot  analysis  of  tags  in  all 
three  classes  is  underway  and  should  help  resolve 
whether  tags  that  match  multiple  genes  indeed  repre¬ 
sent  multiple  transcripts. 
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Examples  of  highly  abundant  SAGE  tags  are  listed 
in  Table  2.  These  were  derived  from  both  the  nuclear 
genome  as  well  as  the  6  kb  mitochondrial  element. 
Tags  in  the  latter  group  map  to  in  ter  genic  regions  of 
the  mitochondrial  mRNAs,  where  small  (40-190  nt) 
highly  fragmented  rRNA  molecules  are  encoded  on 
both  DNA  strands  [28,29].  The  6  kb  element  is  poly- 
cistronically  transcribed,  and  transcripts  containing  ad¬ 
jacent  mRNA  and  rRNA  sequences  have  been  found 
[30].  Hence  it  is  unclear  whether  the  SAGE  tags  are 
indeed  derived  from  rRNA  molecules  or  from  precur¬ 
sor  intermediates  of  polycistronic  transcription  in  the 
mitochondria.  Interestingly  thioredoxin,  a  nuclear  en¬ 
coded  transcript  is  also  expressed  at  high  levels,  consis¬ 
tent  with  the  abundant  expression  of  genes  involved  in 
maintaining  mitochondrial  physiology  and  function. 
Tags  corresponding  to  parasite  specific  surface  proteins 
that  are  ^required  for  erythrocyte  invasion  such  as 
7  MSP-3  ((merezoite)  surface  protein  3)  SERA  (serine  re¬ 
peat  antigen)7“and  Rhop  H3  (rhoptry  protein)  were 
also  abundant.  Finally  several  unknown  genes  and 
hypothetical  ORFs  were  also  highly  expressed.  Hence 
SAGE  will  prove  useful  for  assigning  ORFs  to  previ¬ 
ously  uncharacterized  sequence  reads  generated  by  the 
Genome  project. 

Additionally  it  was  found  that  11  of  the  256  most 
abundant  transcripts  were  represented  by  more  than 
one  10  bp  tag  sequence;  nine  genes  were  represented  by 
two  different  tags  each,  and  two  genes  by  three  differ¬ 
ent  tags.  This  phenomenon  could  result  from  a  partial 
digest  of  parasite  cDNA,  thereby  generating  many  tags 
at  sites  other  than  the  3'  most  Nlalll  site  for  a  given 
gene.  However,  in  such  a  scenario,  the  3'  most  Nlalll 


Table  2 

Highly  expressed  genes® 


Tag  sequence 

Database  match 

Abundance 

TCAGGCGTTA 

Mitochondrial  6  kb  transcript 

1.30 

GAGCAAGCAG 

No  match 

0.58 

ATTTGAAGCA 

Rhop  H3 

0.42 

CTCAGCCGCC 

Mitochondrial  6  kb  transcript 

0.39 

GTAGTTGACA 

Mitochondrial  6  kb  transcript 

0.36 

CGAGGAAAAA 

SERA 

0.27 

AACGACAAGA 

Pfg  27/25 

0.25 

TACAGCTGCT 

MSP-3  (merezoite  surface 
protein  3) 

0.20 

GGG AAAGCG A 

Hypothetical  ORF 

0.19 

GGCACAACTA 

Thioredoxin 

0.16 

GGATATAAAA 

Unknown  protein 

0.16 

a  Examples  of  the  highly  abundant  SAGE  tags  from  the  3D7 
control  library  and  their  corresponding  genes  are  listed.  Tag  sequence 
represents  the  10  bp  SAGE  tag  sequence  adjacent  to  the  Nlalll  site. 
Abundance  is  listed  as  a  percentage  of  all  6702  tags  in  the  SAGE 
library. 


site  of  a  transcript  might  be  expected  to  generate  the 
most  abundant  SAGE  tag;  in  the  present  study,  the 
most  abundant  tag  for  a  given  gene  was  not  always  the 
one  located  at  the  most  3'  position.  Hence,  it  was 
postulated  that  the  high  A-U  content  of  P.  falciparum 
RNA  permits  internal  priming  by  oligo(dT)  during  the 
cDNA  synthesis  step  of  the  SAGE  protocol,  resulting 
in  the  generation  of  more  than  one  tag  from  a  single 
gene.  In  fact  similar  internal  priming  by  oligo(dT)  at 
poly(A)  stretches  within  mRNA  transcripts  was  ob¬ 
served  in  other  SAGE  studies  [31]  as  well  as  during 
RT-PCR  amplification  of  P.  falciparum  genes  corre¬ 
sponding  to  ABRA  (acidic  basic  repeat  antigen), 
SERA  and  elongation  factor,  eF-la  in  the  laboratory 
(data  not  shown).  Multiple  tags  that  match  a  single 
gene  have  been  observed  in  other  systems;  the  abun¬ 
dance  of  such  a  transcript  was  calculated  by  adding  all 
multiple  tag  counts  [12,32]. 

Interestingly,  internal  priming  of  cDNA  at  poly(A) 
tracts  within  genes  confers  some  advantages  to  the 
SAGE  procedure  in  P.  falciparum.  For  example,  it  is 
unclear  whether  the  entire  pool  of  mitochondrial  tran¬ 
scripts  in  the  malarial  parasite  is  polyadenylated;  tran¬ 
scriptional  mapping  of  mitochondrial  mRNA  and 
rRNA  molecules  has  shown  that  these  possess  very 
short  (6^20  bp)  or  non-existent  poly  (A)  tails  [33,34]. 
However  tags  mapping  to  the  6  kb  mitochondrial  ele¬ 
ment  were  found  at  high  abundance  as  mentioned 
earlier.  Hence  the  apparent  lack  of  long  poly  (A)  tails 
on  mitochondrial  transcripts  did  not  exclude  them 
from  representation  in  the  SAGE  library.  Binding  of 
internal  poly  (A)  tracts  within  parasite  transcripts  to 
oligo  (dT)  columns  during  the  mRNA  selection  steps 
of  SAGE  as  well  as  internal  priming  during  the  cDNA 
synthesis  step  allows  the  inclusion  of  differentially 
polyadenylated  RNAs  in  the  analysis. 

In  conclusion,  it  was  demonstrated  that  P.  falci¬ 
parum  is  amenable  to  the  SAGE  technique,  despite  its 
low  genome  complexity.  In  conjunction  with  other 
methods  of  high-throughput  transcriptional  analysis 
such  as  micro-arrays  [8],  SAGE  should  yield  valuable 
information  about  the  fundamental  biology  and  viru¬ 
lence  mechanisms  of  an  important  human  pathogen. 
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The  malaria  parasite  undergoes  a  complex  developmental  process  through  its  life 
cycle.  This  includes  an  asexual  intraerythrocytic  cycle  in  the  vertebrate  host,  and  a  sexual 
cycle  that  commences  with  gametogenesis  in  the  vertebrate  host  and  subsequent  fertilization 
and  maturation  in  the  mosquito  vector.  Regulation  at  the  transcriptional  and  post- 
transcriptional  levels  is  no  doubt  important  for  the  temporal  expression  of  genes  required  at 
each  stage  of  development.  Present  understanding  of  the  cw-elements  important  for 
transcriptional  control  in  Plasmodium  is  severely  restricted.  Sequence  analysis  of  5’ 
flanking  regions  of  Plasmodium  genes  reveal  the  presence  of  sequences  with  homology  to 
known  eukaryotic  control  elements,  for  example  see  [1,  2];  however,  the  functional 

significance  of  these  sequences  in  Plasmodium  has  not  been  demonstrated:  The  intergenic 

region  in  Plasmodium  spp.  is  particularly  AT-rich,  even  within  the  context  of  the  AT- 

biased  (-80%)  genome  [3],  such  that  even  the  identification  of  TATA-like  elements,  and 
assays  to  determine  their  utility  and  importance,  becomes  difficult.  A  growing  but  limited 
number  of  functional  analyses  of  promoter  regions  of  Plasmodium  genes  have  been 
published,  many  of  which  shed  light  on  regions  that  are  necessary  for  efficient  expression 
[2,  4],  However,  only  a  few  studies  to  date  have  identified  specific  sequences,  short  of 
transcriptional  start  sites,  that  appear  to  be  important  for  gene  expression  [4-6],  Due  to  the 
small  numbers,  and  the  fact  that  these  genes  are  expressed  at  different  stages  in  the  parasite 
life  cycle,  no  consensus  or  common  sequences  could  be  established.  Clearly,  much  more 
can  be  learned  about  aspects  of  basal  transcription  as  well  as  stage-specific  control  of  gene 
expression  in  the  malaria  parasite. 

Pgs28  is  expressed  abundantly  on  the  surface  of  mosquito  stages  of  the  avian 
parasite,  Plasmodium  gallinaceum.  Pgs28  belongs  to  the  family  of  Pxs  proteins,  which 
includes  the  P.  berghei  homolog  Pbs21  and  the  P.  falciparum  homolog  Pfs25.  These 
proteins  contain  a  series  of  EGF-like  domains  that  may  serve  a  function  in  cell  signalling  or 
in  adhesion  [7,  8].  Pgs28,  Pfs25  and  Pbs21  had  been  identified  as  targets  for  transmission 
blocking  antibodies  [9-12].  Transcripts  of  pbs21  had  been  observed  in  female  gametocytes 
and  gametes,  as  well  as  zygotes  and  ookinetes,  and  the  pfs25  promoter  appears  to  be 
specifically  active  in  mosquito  stage  parasites,  supporting  the  notion  that  the  genes 
encoding  this  protein  family  are  activated  specifically  during  the  sexual  stages  [5,  13,  14], 
Since  Pbs21  is  initially  expressed  on  the  surface  of  zygote  stage  parasites,  additional  post- 
transcriptional  control  is  exerted  by  the  parasite  to  regulate  Pbs21  expression.  We  are 
interested  in  investigating  pgs28  gene  expression  to  further  understand  transcriptional 
regulation  in  Plasmodium  spp.  and  as  a  step  towards  understanding  the  control  of  sexual 


development  in  P.  galhnaceum.  In  this  report,  we  present  a  functional  analysis  of  the  5  ’ 
flanking  region  of  pgs28,  using  firefly  luciferase  as  a  reporter,  by  which  we  identified  two 
regions  that  are  required  for  pgs28  frarcr-gene  expression.  Furthermore,  using  Northern 
•analysis,  we  define  the  5'  limit  of  the  pgs28  transcript  and  demonstrate  that  pgs28 
transcripts  are  present  during  the  zygote  stage. 

The  5'  and  3'  flanking  sequence  of  pgs28,  together  with  an  in-frame  insertion  of 
the  luciferase  reporter,  were  previously  cloned  into  pBS  (pgs28.1LUC )  [15],  In  this 
study,  the  pgs28-luc  chimera,  containing  pgs28  5’  flanking  sequence,  the  pgs28-luc  fusion 
gene,  and  about  720  bp  of  3'  flanking  sequence,  from  pgs28.1LJJC  was  cloned  into  the 
HindlU  site  of  pBS  to  create  BSpgs28-LUC.  The  1871  bp  5’  flanking  sequence  of  pgs28 
has  been  determined  and  deposited  in  GenBank.  (The  sequence  and  characterization  of  the 
3’  region  was  recently  published  [16].)  Expression  from  BSpgs28-LUC  was  confirmed 
by  immunofluorescent  antibody  staining  and  immuno-electron  microscopy  [17]  and  also  by 
luciferase  assays  performed  24  or  48  hrs  post-transfection  (see  below).  High  expression 
levels  up  to  the  order. of  106  light  units  were  obtained,  offering  a  sensitive  system  for 
determining  changes  in  expression  levels. 

In  order  to  determine  the  sequence  requirements  for  pgs28  expression,  a  series  of 
5  deletion  mutants  was  created  either  by  exonuclease  digestion  of  linearized  BSpgs28- 
LUC  plasmid,  or  by  PCR  mutagenesis.  Deletions  of  790bp  (FP1081),  1131bp  (FP740), 
1358bp  (FP513),  1407  (FP464),  1485bp  (FP386),  1538  bp  (FP333),  1584bp  (FP287), 
1631  bp  (FP240)  and  1905bp  (FP+34)  from  BSpgs28-LUC  were  obtained  (Fig.  1A). 

Additionally,  an  internal  deletion  mutant  A376-316,  which  lacks  the  specified  sequences, 

was  created  by  inverse  PCR.  To  assess  the  contribution  of  the  deleted  sequences  to  pgs28- 
luc  transgene  expression,  these  plasmids  were  transfected  into  sexual  stage  parasites  as 
previously  described  [15].  Luciferase  activity  was  assayed  after  24  or  48  hours.  To 
control  for  transfection  efficiency,  a  second  plasmid,  pgs28-GUS,  was  co-transfected  and 
luciferase  light  units  normalized  to  GUS  fluorescence  units. 

Transfection  using  FP1081  demonstrated  that  expression  of  the  pgs28-lucif 'erase 
fusion  gene  did  not  decrease  significantly  when  the  5'  most  790  bp  were  deleted  from  the 
parent  plasmid  (Fig.  IB).  However,  luciferase  expression  from  FP740,  where  an 
additional  340  bp  has  been  removed,  was  reduced  by  more  than  40%.  A  modest  decrease 
in  promoter  efficiency  was  observed  with  the  removal  of  the  next  227  bp  (FP513).  Further 
deletions  of  up  to  180  bp  (FP464,  FP386,  FP333)  did  not  seem  to  significantly  affect 
expression  when  compared  to  FP513.  Interestingly,  FP287,  containing  a  deletion  of  46  bp 
3  of  FP333,  had  less  than  5%  activity  compared  to  the  full-length  construct.  Furthermore, 


the  internal  deletion  mutant  A376-316  was  also  severely  affected,  having  only  6.6% 

activity.  As  expected,  a  deletion  that  encompasses  part  of  the  pgs28  open  reading  frame 
(FP+34)  abolished  luciferase  expression.  The  mutant  FP240  had  equivalent  activity  to  this 
construct,  suggesting  that  important  elements  necessary  for  transcription  and  possibly 
translation  have  been  removed.  Taken  together,  results  of  the  5  deletion  analysis  suggest 
that  the  minimal  sequence  necessary  forpg^2S  transgene  expression  consists  of  the  333  bp 
upstream  of  the  translational  start  site.  Moreover,  a  17  base-pair  sequence, 
TACCATTTGTACAGACAG,  between  -333  and  -316,  appears  to  be  crucial,  since  pgs28 
expression  was  essentially  abrogated  in  a  5'  deletion  mutant  and  an  internal  deletion  mutant 
that  lack  these  sequences.  We  suggest  that  the  proximal  site  corresponds  to  the  basal 
promoter  or  initiator  element,  as  indicated  in  the  following  section.  We  also  suggest  that 
positive  regulatory  elements  lie  between  -1081  and  -740,  and  perhaps  within  -240  and  - 
513.  This  distal  region  likely  contains  an  enhancer  element(s)  that  contributes  to  pgs28 
promoter  efficiency.  Thus  transcriptional  elements  that  control  pgs28  appear  to  be 
bipartite,  as  in  eukaryotic  promoters  and  other  Plasmodium  genes  that  have  been  analyzed. 

We  used  Northern  analysis  as  a  preliminary  step  to  map  the  transcriptional  start  site 
of  pgs28,  and  to  determine  whether  the  temporal  pattern  of  pgs28  transcription  paralleled 
that  of  its  murine  homolog  pbs21.  RNA  was  isolated  from  newly  formed  zygotes  collected 
after  exflagellation,  and  from  gametes.  As  seen  in  Fig.2B,  an  intense  signal  appeared  at  a 
position  corresponding  to  a  message  of  about  1.4  kb  in  both  zygote  and  gamete  when 
probed  with  BBm600  (lanes  1  and  2),  which  extends  from  -381  in  the  5’  flanking  region  to 
within  the  pgs28  coding  sequence.  A  pgs28  message  of  1.5  kb  has  previously  been 
reported  by  Duffy  and  colleagues  [11]-  Thus,  while  Pgs28  expression  is  most  abundant  on 
ookinete  surfaces,  pgs28  transcript  can  be  seen  as  early  as  the  zygote  stage.  This  is  m 
agreement  with  transfection  studies  in  our  laboratory  using  the  BSpgs28  construct,  as  well 
as  a  pgs28-GFP  fusion,  that  demonstrated  Pgs28  expression  on  the  surface  of  zygotes 

[17], 

Recently,  the  polyadenylation  signal  of  pgs28  was  mapped  to  approximately  425bp 
downstream  of  the  stop  codon,  with  an  estimated  poly(dA)  tail  of  at  least  20  nucleotides 
[16].  Given  that  the  coding  sequence  of  pgs28  is  666bp,  the  transcription  initiation  site  of 
pgs28  would  lie  approximately  between  -390  and  -290bp.  In  agreement  with  this 
estimation,  only  the  probe  BV142,  encompassing  the  sequence  from  -381  to  -240, 
hybridized  to  the  pgs28  transcript  (Fig.  2B,  lane  7),  while  probes  corresponding  to 
sequences  further  upstream  failed  to  hybridize  to  pgs28  mRNA  (lanes  3-6).  Thus,  these 
studies  establish  the  5’  limit  of  the  pgs28  transcript  at  -381  bp  upstream  from  the 


translational  start  site.  5'  deletion  analysis  suggests  that  the  transcriptional  start  site  is 
likely  to  be  downstream  of  -333bp.  Experiments  to  determine  the  precise  5'  end  of  the 
pgs28  transcript  will  resolve  this  aspect  of  pgs28  transcription. 

The  5’  flanking  sequence  of  pgs28  had  been  inspected  for  homology  to  other 
eukaryotic  transcriptional  regulatory  elements.  The  highly  AT-rich  region  between  -1081 
and  -520,  typic-al  of  intergenic  regions  of  Plasmodium  spp.,  does  not  contain  sequences 
that  are  analogous  to  known  eukaryotic  regulatory  elements.  Two  GTAAT  sequences, 
demonstrated  to  be  important  for  GBP130  expression  [6],  can  be  found  in  this  region. 
Whether  an  element  associated  with  an  enhancer  of  an  asexual  stage  gene  in  P.  falciparum 
is  important  for  expression  of  pgs28,  a  sexual  stage  specific  gene,  can  only  be  determined 
by  experimental  means. 

Sequences  downstream  of  -520  have  also  been  examined.  Within  this  region  are 
two  putative  TATA  elements  TAAAAAGAATAA  and  TATAAATGTTT,  centered  at  - 
434bp  and  -360bp  respectively  from  the  start  codon.  Since  these  sequences  can  be  deleted 
from  the  reporter  constructs  (FP386  and  FP333)  without  drastically  affecting  expression, 
they  are  not  likely  to  be  important  for  pgs28  expression.  This  again  illustrates  that 
sequence  analogy  to  eukaryotic  promoter  elements  .does  not  necessarily  imply  functional 
significance  in  Plasmodium  genes.  Inspection  of  the  presumed  5’  UTR  reveals  a  T-rich 
stretch,  constituting  up  to  74%  of  the  bases  between  130bp  and  242bp.  A  series  of  five  8- 
base  pair  inverse  repeat  elements  (T7TA7T7TATTT)  could  be  identified  within  this 
sequence.  Further  examination  of  this  region  uncovers  3  direct  repeats  of  27bp  to  29bp  in 
length.  Whether  these  sequences  have  functions  at  a  post-transcriptional  step  to  enhance 
pgs28  expression  awaits  further  experimentation.  Recently,  transfection  studies  of  pfs25 
promoter  constructs  into  P.  gallinaceum  ookinetes,  and  mobility  shift  assays  using  P. 
gallinaceum  ookinete  nuclear  extracts,  suggest  that  the  sequence  AAGGAATA,  found  at  - 
403  to  -396  and  -483  to  -476  from  the  initiation  codon  in  pfs25,  interacts  with  a  nuclear 
factor  and  is  important  for  expression  of  pfs25  [5].  A  similar  sequence,  AAGAATAA,  is 
found  at  -354  and  -347  in  pgs28,  within  the  putative  proximal  TATA  sequence.  Again,  the 
transfection  studies  reported  here  suggest  that  this  sequence  in  pgs28  can  be  deleted 
without  severely  affecting  pgs28  transgene  expression.  This  suggests  that  the  nuclear 
factor  PAF-1  [5]  is  not  involved  in  pgs28  transcription,  and/or  that  it  has  a  stringent 
sequence  requirement  that  the  AAGAATTT  sequence  in  pgs28  does  not  satisfy.  Even 
though  pgs28  and  pfs22  belong  to  the  same  family,  and  possess  similar  expression  profiles 
during  the  parasite  life  cycle,  they  may  not  necessarily  be  controlled  by  the  same 
evolutionarily  conserved  factors.  Nonetheless,  given  the  close  evolutionary  relationship 
between  P.  gallinaceum  and  P.  falciparum,  it  would  be  of  great  interest  to  determine 


whether  the  17  bp  upstream  sequence  in  pgs28  between  -333  and  -316  would  be  able  to 
functionally  replace  the  pfs25  sequence,  and  vice  versa. 
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Legends 


Fig.  1A.  Schematic  of  the  pgs28  5'  flanking  sequence. 

The  5'  flanking  sequence  of  pgs28  cloned  into  BSpgs28-LUC  is  shown,  together  with  part 
of  the  pgs28  and  foe -coding  sequence.  To  obtain  BSpgs28-LUC,  pgs28.1LUC  [15]  was 
digested  with  HindRl  and  cloned  into  similarly  digested  pBluescript  KS+  (Strategene). 

The  bars  at  approximately  -440  and  -360  represent  two  putative  TATA  boxes.  The  hatched 
box  downstream  of  -240  represent  the  T-rich  sequence  with  internal  repeats. 

Positions  of  the  5'  deletion  mutants  are  indicated.  The  numbers  refer  to  the  distance  in 
nucleotides  away  from  the  start  of  the  coding  region  (+1) . 

To  generate  the  5’  deletion  mutants  FP1081,  FP464,  FP513,  FP386  and  FP+34, 
BSpgs28-LUC  was  first  digested  with  SacI  and  Spel  (New  England  Biolabs).  The 
linearized  plasmid  was  digested  further  with  exonuclease  EE/mung  bean  nuclease 
essentially  as  described  by  the  manufacturer  (Strategene).  E.  coli  (XL-1  Blue)  cells  were 
transformed  with  ligated  products  and  the  sizes  of  the  plasmids  obtained  were  determined 
by  agarose  gel  electrophoresis.  FP464  was  generated  by  recloning  the  filled-in  Nde I  insert 
from  BSpgs28-LUC  into  Smal  digested  pBS.  FP333,  FP287  and  FP240  were  created  by 
PCR  mutagenesis,  using  FP464  as  template,  and  the  upstream  primers 
5'GAATTCCTGCAGCCCTACCATTTGTACAGAC, 

5  ’  GAATTCCTGCAGCCC  ACTAGCTAAAAG  AAATATG,  and 

5'GAATTCCTGCAGCCCAi  1T1 TATI TAA1 ll TITC  respectively.  The  Pstl  site  is 
underlined.  The  downstream  primer,  5’CTAGAGGATAGAATGGCGCCG.  containin'? 
an  internal  Sfol  site  (underlined),  was  used  in  all  cases  and  was  derived  from  the  luc 
coding  region.  Purified  PCR  fragments  were  digested  with  Pstl  and  Sfol  and  cloned  into 
similarly  cut  FP464  vector  backbone. 

To  generate  A376-316,  primers  WFM48 

5'CCATTTGTTATTGTATATAAAAAAAAAAAC  and  WFM20R 

5  GAT  CTTC 1 T  AAT  C 1 1 1 GT  A  AAA  AT  A  ACTG,  which  flank  the  sequences  to  be  deleted, 
were  used  to  amplify  FP513  that  had  previously  been  linearized  with  BglU,  utilizing  the 
TaqPlus  Long  PCR  system  (Strategene).  30  cycles  of  PCR  reactions  were  performed  in 
low  salt  buffer  under  the  following  conditions:  94°C  for  1  min,  55°C  for  1  min  and  72°C 
for  7  mins.  PCR  products  were  treated  with  Dpnl  at  for  30  mins  and  further  treated  with 

l,ul  of  Pfu  for  10  cycles  and  incubation  at  37°C  for  30  mins.  Amplified  products  were 
phenolrchloroform  extracted  and  ethanol  precipitated,  and  resuspended.  Amplified  DNA 


containing  the  deletion  was  allowed  to  circularize  and  transformed  into  E.  coli  (XL-1  Blue) 
cells.  Sequences  of  all  clones  were  confirmed  by  DNA  sequencing. 


B.  Luciferase  expression  from  pgs28  5'  deletion  mutants. 

Parasites  were  transfected  with  the  indicated  plasmids  and  pgs28-GUS,  and  luciferase  and 
GUS  activity  assessed  24  or  48  hrs  post-transfection,  as  described  [16].  The  construction 

of  pgs28-GUS,  containing  an  in-frame  insertion  of  the  (3-glucuronidase  gene  (Clontech) 

within  pgs28,  has  been  described  elsewhere  [16],  Normalized  relative  light  units  and  SD 
are  shown.  The  indicated  activity  is  the  average  of  3-8  determinations. 

Fig.  2A  Positions  of  DNA  probes  used  to  map  the  5’  end  of  pgs28 
transcripts. 

Probes  MS135  (-786  to  -651),  SN188  (-651  to  -463),  NB82  (-463  to  -381),  and 
BV142(-381  to  -240)  andBBm600  (-381  to  +217)  were  made  by  digesting  an  Xbal 
fragment  of  BSpgs28-LUC  containing  pgs28  sequences  with  Mwol/Swal,  Swal/Ndel, 
Ndel/Bglll,  Bglll/VspI  and  Bglll/BamHI  restriction  enzyme  pairs,  respectively.  These 
resultedin  fragments  of  lengths  indicated  by  the  numerals  in  the  designations. 

B.  Determination  of  size  and  the  5’  end  of  pgs28  mRNA  from  P. 
gallinaceum 

RNA  was  extracted  either  from  zygotes  (lanes  1  and  3)  or  ookinetes  (lanes  2,  4-7), 
fractionated  and  Northern  blotted  using  standard  procedures.  Between  four  and  five 
micrograms  total  RNA  obtained  from  3xl07  parasites  were  included  per  lane.  Blots  were 
probed  with  the  indicated  DNA  fragments,  washed  and  autoradiographed  for  24-48  hours. 
Lanes  1  and  2,  BBm600;  lanes  3  and  4,  MS  135;  lane  5,  SN188;  lane  6,  NB82;  lane  7, 
BV142. 
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Abstract 


Serial  analysis  of  gene  expression  (SAGE)  was  applied  to  the  malarial  parasite, 
Plasmodium  falciparum  in  order  to  characterize  the  comprehensive  transcriptional  profile 
of  erythrocytic  stages.  A  SAGE  library  of  approximately  8335  tags  representing  4866 
different  genes  was  generated  from  3D7  strain  parasites.  BLAST  analysis  of  high 
abundance  SAGE  tags  identified  the  major  metabolic  pathways  that  are  utilized  by  the 
organism  under  normal  culture  conditions.  Furthermore  several  tags  expressed  at  high 
abundance  (3  0%  of  tags  matching  to  unique  loci  of  the  3D7  genome)  were  derived  from 
previously  uncharacterized  open  reading  frames,  demonstrating  the  use  of  SAGE  in 
genome  annotation.  The  open  platform  "profiling"  nature  of  SAGE  also  lead  to  the 
important  discovery  of  novel  transcriptional  phenomenon  in  the  malarial  pathogen:  a 
significant  number  of  highly  abundant  tags  that  were  derived  from  annotated  genes  (17%) 
corresponded  to  anti-sense  transcripts.  This  SAGE  data  was  validated  by  two 
independent  means,  strand  specific  RT-PCR  and  Northern  analysis,  where  anti-sense 
messages  were  detected  in  both  asexual  and  sexual  stages.  This  finding  has  implications 
for  transcriptional  regulation  of  Plasmodium  gene  expression. 

Introduction 

Malaria,  an  infectious  disease  caused  by  the  protozoan  parasite  Plasmodium 
falciparum,  affects  300-500  million  people  globally  each  year  (WHO,  1997).  Increasing 
drug-resistance  in  the  parasite  and  insecticide-resistance  in  the  Anopheles  vector  have 
exacerbated  this  substantial  public  health  problem.  Against  this  backdrop,  effective 
strategies  to  combat  the  disease  require  a  fundamental  knowledge  of  the  basic  biology  of 
Plasmodium  in  order  to  develop  new  pharmatherapeutics  and  vaccines  that  target  the 
parasite. 

Most  studies  of  Plasmodium  biology  have  been  directed  at  single  genes  thought  to  be 
important  for  pathogenesis.  With  the  advent  of  genomic  technologies,  however,  new 
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,  approaches  to  combat  the  disease,  such  as  identifying  entire  repertoires  of  transcripts 
expressed  under  different  conditions,  have  now  become  available.  Genomic  approaches 
were  initiated  with  the  sequencing  of  the  P.  falciparum  (3D7  strain)  genome,  a 
collaborative  project,  undertaken  by  the  Malaria  Genome  Consortium  that  is  already  close 
to  completion  (Butler,  1997;  Craig  et  al.,  1999;  O’Brien,  1997).  Chromosomes  2  and  3 
have  been  fully  sequenced  (Bowman  et  al.,  1999;  Gardner  et  al.,  1998)  while  eighty  to 
ninety  percent  of  the  estimated  6000  open  reading  frames  (ORJFs)  in  the  3D7  genome  are 
now  available  as  raw  sequence  data.  The  next  challenge  is  to  use  this  vast  amount  of  data 
to  study  the  functional  relevance  of  various  genes.  For  example,  it  is  now  possible  to 

.«r- 

identify  genes  that  are  transcribed  in  different  stages  of  the  parasite’s  development  and  also 
genes  that  are  induced  or  repressed  in  response  to  various  stimuli  such  as  immune-  or 
drug-  pressure.  For  this  reason,  whole  genome  expression  analyses  using  high-density 
micro-arrays  (Hayward  et  al.,  2000)  and  serial  analysis  of  gene  expression  (SAGE) 
(Munasinghe  et  al.,  2000)  have  been  developed  for  P.  falciparum.  These  new  approaches 
will  complement  each  other  to  generate  data  for  the  Plasmodium  research  community. 
Genome  sequence  will  expedite  the  micro-array  and  SAGE  analysis;  conversely,  open 
platform  profiling  techniques  such  as  SAGE  will  help  the  Malaria  Genome  Project  with 
annotation  of  previously  uncharacterized  open  reading  frames  (ORFs)  and  with  novel  gene 
discovery. 

SAGE  provides  a  sensitive  and  highly  quantitative  description  of  the  transcript  profile 
of  a  given  cell  type  (Velculescu  et  al.,  1995;  Velculescu  et  al.,  1997a).  The  SAGE 
technology  samples  short  sequence  tags  (14  bases)  from  mRNA  transcripts  in  the 
population  of  interest.  These  tags  contain  sufficient  sequence  information  to  identify,  by 


BLAST  analysis,  the  transcript  from  which  it  was  derived.  The  frequency  of  each  tag  in 
the  SAGE  library  is  an  accurate  estimate  of  the  abundance  of  its  corresponding  mRNA 
transcript.  Numerous  groups  have  used  this  technique  successfully  and  described  the 
SAGE  protocol  in  detail  (Madden  et  al.,  1997;  Matsumura  et  al.,  1999;  Polyak  et  al., 

1997;  Velculescu  et  al.,  1995;  Velculescu  et  ad.,  1997a;  Virion  et  al.,  1999). 

In  this  report,  we  show  that  SAGE  can  be  used  to  study  gene  expression  of  the 
asexual  stages  of  P.  falciparum.  Asexual  parasites  express  many  virulence  factors  and  are 
the  targets  of  anti-malarials  such  as  chloroquine;  hence  an  in-depth  understanding  of  their 
transcriptional  profiles  will  set  the  stage  for  future  experiments  addressing  responses  to 
immune  or  drug  pressure. 

SAGE  was  successfully  applied  to  erythrocytic  stage  parasites  (3D7  strain)  of  P. 
falciparum  at  baseline  culturing  conditions,  and  a  SAGE  library  of  approximately  8000 
tags  was  generated.  A  majority  of  these  corresponded  to  unique  parasite  genes,  as 
demonstrated  by  BLAST  analysis  of  a  subset  of  tags.  The  SAGE  data  was  validated  by 
Northern  and  RT-PCR  analysis  of  genes  predicted  to  be  highly  expressed  based  on  tag 
counts.  BLAST  analysis  of  highly  abundant  tags  also  provided  insight  into  networks  of 
major  metabolic  pathways  that  are  utilized  by  the  parasite  under  normal  culture  conditions. 
Finally,  SAGE  also  revealed  the  presence  of  anti-sense  transcription  in  the  malarial 
parasite,  a  phenomena  that  has  been  previously  missed  by  other  methods  of  transcriptional 
analysis.  This  SAGE  data  was  also  validated  by  two  independent  methods,  RT-PCR  and 
Northern  analysis;  here  anti-sense  transcripts  for  genes  expressed  in  asexual  as  well  as 
sexual  stage  parasites  were  found.  The  biological  role  of  anti-sense  RNA  in  Plasmodium 
species  is  unclear;  the  phenomenon  may  be  involved  to  translation  control  or  may  reflect 
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mechanisms  of  transcriptional  initiation.  In  summary,  SAGE  in  Plasmodium  has  revealed 
many  facets  of  the  basic  functioning  of  the  parasite  in  culture,  and  sets  the  stage  for  future 
comparisons  of  the  transcriptional  responses  of  P.  falciparum  to  different  stimuli. 

Materials  and  Methods 
Parasite  culture  and  RNA  extraction 

3D7  strain  parasites  were  maintained  under  standard  culturing  conditions  (Trager  and 
Jensen,  1976)  with  modifications  as  previously  described  (Munasinghe  et  al.,  2000). 
Polyadenylated  RNA  was  harvested  from  cultures  at  8%  parasitemia  (1%  rings,  5% 
trophozoites  and  2%  schizonts),  and  used  in  the  SAGE  procedure  as  previously  described 
(Munasinghe  et  al.,  2000). 

Data  analysis 

SAGE  tags  from  3D7  asexual  stages  were  analyzed  using  the  SAGE  software 
(Johns  Hopkins  University  and  Genzyme),  which  extracts  14bp  tag  counts  from  sequence 
files.  In  order  to  assign  gene  identity  to  each  tag,  the  3D7  experimental  tag  list  was 
matched  against  a  P.  falciparum  tag  database.  This  database  was  created  by  extracting 
14bp  tags  from  P. falciparum  sequence  deposited  in  GenBank  (as  of  July  13th  2000),  as 
well  as  from  a  compiled  database  of  recently  deposited  3D7  genome  sequence  (obtained 
from  the  TIGR,  Sanger  and  Stanford  sequencing  centers  and  compiled  at  the  University  of 
Pennsylvania  as  of  July  26th  2000-  kindly  provided  by  Drs.  Jessica  Kissinger  and  David 
Roos).  As  the  P.  falciparum  genome  is  not  fully  annotated,  all  potential  SAGE  tags  from 
both  sense  and  anti-sense  strands  were  extracted  (i.e.,  tags  were  extracted  from  each 
database  in  the  “genomic  mode”  rather  than  the  “cDNA  mode”). 
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The  software  output  files  are  organized  in  such  a  way  that  matches  to  a  single 
locus,  matches  to  multiple  loci,  and  no  matches  to  database  sequences  can  be  readily 
determined.  For  genomic  sequence  that  is  annotated,  it  is  possible  to  assign  gene 
identification  to  each  tag  in  the  manner  outlined  above;  however,  most  of  the  available  P. 
falciparum  genome  sequence  is  not  annotated.  Therefore,  the  187  most  abundant  tags 
(abundance  level  of  greater  than  4)  were  characterized  by  manual  BLASTx  analysis  -  see 
flow  chart  in  Figure  1 .  Here,  for  tags  derived  from  un-annotated  reads,  a  500-1000bp 
sequence  surrounding  the  tag  was  translated  in  all  6  reading  frames  and  compared  to  the 
entire  NCBI  protein  database.  14bp  tags  that  failed  to  match  either  database  were 
analyzed  using  only  the  first  13bp  of  the  tag  sequence  in  the  manner  outlined  above. 
Reverse  transcriptase  PCR 

RT-PCR  was  performed  using  the  3'  RACE  kit  (Gibco-BRL)  according  to  the 
manufacturer's  protocols.  First  strand  cDNA  synthesis  was  primed  with  oligo  (dT)lg,  while 
PCR  was  performed  using  the  gene  specific  primers  described  below. 


Calmodulin  (sense): 


5'  GTCC  AT  C  ACC  AT  CAATAT  C  AGC  3' 


Calmodulin  (anti-sense):  5'  CT AAGG AGTT AGGAACGGT CAT G  3' 


msp-3  (sense): 
msp-3  (anti- sense): 
pfg27/25  (sense): 
pfg27/25  (anti-sense): 
rap-1  (sense): 
rap-1  (anti-sense): 


5’  TTTTT GT GTT CT GGAACGCCT CCTCC  3’ 


5’  GCTTCC  GAAG AT GCT G AAAAAGCTGC  3’ 
5'  T  CTT  GTCGTTC  AT  GAT  ACGCTT  C  3' 

5'  GT  AC  AAAAGGAT  AGT  GCC  AAGCCC  3' 

5’  CTTT G AAG AAAT CTCT G ATTT C AGC  3’ 


5’  GCTTT  AGAAGGT  GT  CT  GTTC  AT  ATC  3’ 
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PCR  reactions  were  carried  out  according  to  the  manufacturer’s  protocol  (3' 

RACE  kit,  Gibco-BRL).  Initial  denaturation  of  the  template  occurred  at  94°  C  for  3 
minutes.  Amplification  was  performed  for  5  cycles  at  94°  C  for  45  seconds,  50°  C  for  45 
seconds  and  72°  C  for  45  seconds,  followed  by  26  cycles  of  identical  amplification  where 
the  annealing  temperature  was  increased  to  55°  C.  Finally  ,  extension  of  partial  PCR 
products  was  completed  at  72°  C  for  6  minutes. 

Strand-specific  RT-PCR  utilized  0.5  pg  of  mRNA  per  reaction  and  was  performed 
with  the  express  purpose  of  distinguishing  sense  mRNA  from  anti-sense  mRNA  (Yu  et  al., 
1995).  RT-PCR  was  performed  using  the  3'  RACE  kit  (Gibco-BRL);  however,  first  strand 
cDNA  was  primed  with  gene-specific  primers  that  hybridize  to  either  sense  or  anti-sense 
messages,  rather  than  with  an  oligo(dT)is  primer.  The  gene- specific  primers  are  identical 
to  the  primers  listed  above.  A  tenth  of  the  cDNA  sample  was  PCR  amplified,  using  the 
same  set  of  gene-specific  primers  and  amplification  conditions  described  above. 

PCR  products  were  electrophoresed  on  1.2%  agarose  gels.  All  resultant  PCR 
products  were  cloned  into  the  pCRII  vector  using  the  TA  cloning  kit  (Invitrogen)  and 
sequenced  to  confirm  the  identity  of  the  amplified  cDNA. 

Northern  blots 

Northern  analysis  was  performed  according  to  standard  protocols.  Briefly  lpg  of 
mRNA  from  3D7  cultures  was  gel  electrophoresed,  blotted  onto  BA85  nitrocellulose 
membranes  (Schleicher  and  Schuell)  and  probed  with  gene  specific  DNA  probes.  All 
probes  (calmodulin,  msp-3,  rap-1  and  pfg27/25 )  were  derived  from  the  RT-PCR  products 
described  in  the  previous  section.  DNA  probes  were  radiolabeled  with  a32P  dATP  using 
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random  hexanucleotides  and  the  Klenow  fragment  of  DNA  polymerase.  Blots  were 
visualized  by  autoradiography. 

For  strand-specific  northerns,  20  pg  of  total  RNA  were  used  per  blot  as  described 
above.  Probes  for  strand-specific  Northern  analysis  were  generated  in  the  following 
manner.  RT-PCR  products  of  calmodulin  and  msp-3  cDNAs  (see  previous  sections)  were 
cloned  into  the  pCRII  vector  and  then  sub-cloned  into  pBluescript.  The  orientation  of 
calmodulin  and  msp-3  genes  within  pBluescript  was  determined  by  sequencing.  The 
template  for  synthetic  RNA  corresponding  to  the  sense  strand  was  obtained  by  digestion 
of  each  pBluescript  plasmid  with  Xhol,  while  the  template  for  synthetic  RNA 
corresponding  to  anti-sense  RNA  was  obtained  by  digestion  of  the  pBluescript  plasmids 
with  BamHI.  Synthetic  RNAs  corresponding  to  the  sense  or  anti-sense  strand  of  either 
gene  were  obtained  by  in  vitro  transcription  reactions  (performed  according  to  standard 
protocols).  Strand  specific  RNA  probes  were  also  obtained  under  the  same  conditions  in 
the  presence  of  a32P  ATP. 

Quantitative  northern  analysis  was  carried  out  for  calmodulin  and  msp-3  to 
determine  whether  the  ratio  of  their  transcripts  was  comparable  to  that  determined  by 
SAGE.  Northern  blots  and  gene  specific  DNA  probes  were  prepared  as  described  above. 
Known  amounts  of  synthetic  300bp  RNA  fragments  from  each  gene  were  run  alongside 
the  mRNA  sample  as  markers  for  quantification.  Blots  were  exposed  to  X-ray  film  (Kodak 
XO-MAT)  such  that  the  intensity  of  the  signal  was  within  the  linear  range  of  the  film. 
Signal  intensities  for  each  of  the  transcripts  in  the  mRNA  sample  were  converted  to  molar 
amounts  by  reference  to  those  of  the  synthetic  RNAs.  Signal  intensities  were  measured  by 
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,  scanning  the  X-ray  film  into  Adobe  Photoshop,  and  utilizing  NTH  Image  software  to 
quantify  bands  by  pixel  density. 

P  ggllinacewn  gamete  preparation  and  RNA  isolation 

P.gallinaceum  parasites  were  propagated  in  White  leghorn  chickens  by  serial 
injection  into  wing  veins.  At  parasitemias  of  50-70%,  blood  was  withdrawn  by  heart 
puncture.  Gametogenesis  was  induced  as  described  previously  (Goonewardene  et  al., 
1993),  with  the  inclusion  of  xanthurenic  acid  (Sigma)  at  a  final  concentration  of  50  pM  in 
the  exflagellation  buffer.  Gametes  and  zygotes  were  purified,  also  as  described  previously 
(Goonewardene  et  al.,  1993),  and  1X107  cells  were  incubated  at  25°C  in  Medium  199 
(Gibco-BRL)  and  harvested  for  analysis  at  0,  24,  and  48  hours  after  isolation.  Total  RNA 
was  isolated  using  Tri  reagent  (Molecular  Research  Center,  Inc.)  according  to  the 
manufacturer’s  protocol.  Total  RNA  obtained  from  1X107  parasites  was  used  for  each 
RT-PCR  reaction.  Strand-specific  RT-PCR  was  performed  as  described  previously  with 
the  following  primers: 

pgs28  (sense):  5 ’  CAT CT AGC AT AGT C AGC AC AAGGTTT ATTT G  3 ’ 

pgs28  (anti-sense):  5’  CAAACGAAGATTATTTAGTCAAAC  3’ 

Results 

3D7  SAGE  tag  library  from  asexual  blood  stage  parasites 

A  total  of  8335  SAGE  tags  were  analyzed  from  the  asexual  blood  stages  of 
P. falciparum,  3D7  strain.  A  preliminary  analysis  showed  that  these  8335  tags 
corresponded  to  4798  unique  genes  (Figure  2A).  Of  these,  1254  genes  were  present  at  an 
abundance  of  two  or  greater.  The  537  tags  expressed  at  abundance  levels  greater  than  or 
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equal  to  20  tags  (percentage  frequency  of  0.2)  accounted  for  6.4%  of  the  total  mRNA 
mass  but  only  0.3%  (15)  of  the  total  number  of  unique  genes.  As  expected,  these 
abundance  groups  had  the  highest  percentage  of  matches  to  GenBank  entries  (Figure  2B), 
implying  that  many  highly  expressed  messages  have  been  readily  cloned  and  studied.  The 
lower  abundance  tags  (frequency  of  less  than  20  tags)  accounted  for  93.6%  of  the  total 
®RNA  mass,  and  represented  a  vast  majority  of  the  unique  genes  expressed  in  the 
parasite.  Moreover,  these  tags  gave  many  fewer  matches  to  GenBank;  hence  SAGE  in 
Plasmodium  falciparum  will  aid  in  the  discovery  of  novel  malarial  genes. 

BLAST  analysis  of  SAGE  tags 

To  assess  whether  14bp  tags  could  uniquely  identify  genes  in  the  highly  A-T  rich 
Plasmodium  genome,  these  SAGE  tags  were  searched  against  3D7  genome  sequence.  We 
decided  that  for  an  accurate  estimate  of  the  “tag  to  gene”  mapping  in  Plasmodium ,  all 
available  sequence  data,  both  cDNA  and  genomic,  would  provide  the  most  complete 
picture.  Sequencing  of  the  P.  falciparum  genome  is  close  to  completion;  however,  much 
of  the  newly  available  P.  falciparum  sequence  data  has  yet  to  be  annotated.  Therefore,  the 
187  most  abundant  SAGE  tags  were  analyzed  in  a  more  rigorous  manner  by  BLASTx 
analysis.  A  schematic  of  the  BLAST  analysis  is  shown  in  Figure  1.  This  analysis  revealed 
that  a  majority  of  the  SAGE  tags  (88%)  corresponded  to  P.  falciparum  genome  sequence. 
Most  of  the  tags  that  match  to  single  loci  (70%)  he  within  known  genes;  hence  SAGE  tags 
can  be  used  to  uniquely  identify  genes  in  Plasmodium.  The  other  30%  of  tags  that  match 
single  sites  correspond  to  unknown  genes  and  hypothetical  open  reading  frames.  Thus 
SAGE  data  reveals  not  only  predicted  ORFs  that  are  expressed  but  also  previously 
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’  uncharacterized  ORFs;  hence  SAGE  in  Plasmodium  has  the  capacity  to  assist  in 
annotation  of  the  genome. 

Approximately  10%  of  the  187  most  abundant  SAGE  tags  did  not  match  parasite 
sequence.  We  expect  this  number  to  decrease  as  the  genome  project  nears  completion. 

The  percentage  of  SAGE  tags  that  gave  multiple  matches  within  the  P.  falciparum 
genome  was  also  calculated  and  found  to  be  1 8%.  This  number  is  about  four-fold  higher 
than  that  obtained  by  Velculescu  et  al  (1995)  in  a  SAGE  library  of  human  pancreatic 
tissue.  In  the  present  study,  the  35  tags  that  matched  more  than  one  loci  were  further 
investigated,  of  these  tags,  21  (60%)  matched  2  or  3  genes  while  14  (40%)  matched 
greater  than  3  genes.  The  latter  set  of  tag  sequences  was  of  lower  complexity  in  general. 
Northern  Blot  analysis  should  help  resolve  whether  tags  that  match  multiple  genes  indeed 
represent  multiple  transcripts. 

Abundant  transcripts  expressed  in  P.  falciparum  grown  in  culture 

The  BLAST  analysis  described  earlier  enabled  us  to  assign  genes  to  highly  abundant 
SAGE  tags;  examples  of  these  are  listed  in  Table  1.  This  analysis  provided  a  snapshot  of 
the  major  transcripts  expressed  by  the  parasite.  A  complete  picture  of  metabolic  pathways 
utilized  by  P.  falciparum  growing  in  culture  will  incorporate  protein  expression  and 
stability;  nevertheless,  BLAST  analysis  of  abundant  SAGE  tags  provides  the  first  global 
description  of  genes  and  hence,  metabolic  pathways  that  might  be  transcriptionally 
regulated.  The  most  abundant  transcripts  were  grouped  into  functional  categories  to 
reveal  the  transcriptional  profile  of  3D7  parasites  grown  in  culture  (Figure  3).  Many  tags 
represented  housekeeping  functions  carried  out  by  all  prokaryotic  and  eukaryotic  cells 
(transcription,  translation,  chaperones,  cytoskeleton,  etc.)  while  some  functional  classes 
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,  were  highly  specific  for  the  unique  life  cycle  of  Plasmodium  (membrane  associated 
proteins  involved  in  invasion,  DOXP  pathway). 

Interestingly,  many  of  the  most  abundant  messages  (5.3%)  appear  to  be  transcribed 
from  the  6kb  mitochondrial  genome  and  another  2.1%  (thioredoxin,  vacuolar  ATPase 
subunit  B,  ATPase  transporter,  ubiquinol  cytochrome-c  reductase  like  protein)  are 
required  for  oxidative  metabolism.  Therefore  a  significant  proportion  of  abundant 
transcripts  encode  proteins  that  are  dedicated  towards  redox  processes. 

Stage-specific  transcripts  are  highly  represented  in  the  list  of  abundant  messages, 
reflecting  the  different  developmental  stages  present  in  the  culture.  For  example,  mRNAs 
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encoding  cell  surface  proteins  involved  in  merozoite  invasion  (Cowman  et  al.,  2000) 
comprise  8%  of  the  most  abundant  transcripts.  These  include  merozoite  surface  proteins  3 
and  4  (MSP-3  and  -4),  rhoptry  associated  protein-1  (RAP-1)  and  merozoite  capping 
protein.  Tags  corresponding  to  serine  repeat  antigen,  a  soluble  protein  that  is  associated 
with  the  parasitophorous  vacuole  were  found  at  high  abundance  (0.32%).  Also,  present  at 
high  abundance  (0.25%)  is  a  tag  representing  the  gametocyte  surface  antigen  Pfg27/25, 
shown  to  be  essential  for  gametogenesis  (Lobo  et  al.,  1999). 

Abundant  SAGE  tags  represented  major  metabolic  pathways  of  the  malarial  parasite. 
As  asexual  blood  stages  of  Plasmodium  do  not  store  energy  reserves  in  the  form  of 
glycogen  or  lipids,  glucose  taken  up  from  plasma  is  the  primary  source  of  energy 
(Sherman,  1991).  Therefore,  glucose  metabolism  is  a  prominent  aspect  of  intracellular 
growth  and  not  unexpectedly,  proteins  required  for  glucose  metabolism  were  represented 
among  the  abundant  tags  (aldolase,  PEP  carboxykinase,  and  triosephosphate  isomerase). 
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Although  lipids  are  not  utilized  as  a  major  source  of  energy  by  P.  falciparum ,  there  is 
a  significant  increase  in  levels  of  phospholipids,  diacylglycerol  and  triacylglycerol  within 
the  red  blood  cell  upon  merozoite  invasion  (Vial  and  Ancelin,  1998).  This  increase  in  the 
total  lipid  content  is  associated  with  a  biosynthetic  requirement  for  lipids  during  formation 
of  the  membranes  surrounding  the  parasite  (the  parasitophorous  vacuolar  membrane  and 
the  tubovesicular  membrane).  N-myristoyl  transferase,  an  enzyme  that  plays  a  role  in  the 
formation  of  lipoproteins,  was  found  among  the  187  most  abundant  tags;  however,  tags 
representing  proteins  involved  in  lipid  biosynthesis  were  not  present. 

Intra-erythrocytic  P.  falciparum  parasites  are  capable  of  de  tiovo  synthesis  of 
pyrimidines  from  precursor  molecules  (Walsh  and  Sherman,  1968),  with  a  requirement  for 
para-aminobenzoic  acid  (pABA)  and  folate  cofactors.  Unlike  their  hosts,  malarial  parasites 
do  not  use  exogenous  folate  cofactors,  but  instead,  synthesize  these  de  novo  (Scheibel  and 
Sherman,  1988).  SAGE  data  revealed  tags  corresponding  to  ribonucleotide  reductase,  an 
enzyme  of  the  pyrimidine  biosynthetic  pathway  and  dihydrofolate  synthase,  an  enzyme  of 
the  folate  pathway.  Polyamine  biosynthetic  enzymes  were  also  represented  among  the 
SAGE  tags  (ornithine  decarboxylase  and  ornithine  aminotransferase). 

The  unique  intracellular  niche  of  malarial  parasites  results  in  the  expression  of  many 
parasite-specific  metabolic  pathways.  For  example,  growth  of  the  asexual  parasites  within 
red  blood  cells  is  accompanied  by  degradation  of  hemoglobin  and  the  subsequent 
detoxification  of  heme  by-products  (Foley  and  Tilley,  1998;  Krogstad  and  De,  1998, 
Rosenthal  and  Meshnick,  1998).  Tags  representing  proteins  implicated  in  the 
detoxification  of  heme  (histidine-rich  proteins  I  and  II,  glutathione  reductase)  were  found 
at  high  abundance  in  the  SAGE  library.  Surprisingly,  the  plasmepsin  and  falcipain 


13 


proteases,  that  play  a  role  in  hemoglobin  degradation,  were  not  found  in  the  list  of  highly 
expressed  genes.  This  may  be  related  to  the  protein  stability  of  these  factors  or  may  be 
due  to  the  fact  that  their  transcription  occurs  at  an  earlier  stage  in  the  parasite  life  cycle 
than  the  trophozoite  stage,  which  was  the  predominant  stage  in  the  study  population. 

Finally,  SAGE  data  revealed  the  expression  of  mRNA  encoding  deoxy-D-xylulose  5- 
phosphate  synthase  (DOXP  synthase)  at  high  levels  (0.09%).  The  DOXP  pathway  was 
recently  identified  as  a  parasite-specific  metabolic  pathway  important  for  isoprenoid 
biosynthesis  (Jomaa  et  al.,  1999).  As  this  pathway  is  localized  in  the  apicoplast,  a  plant- 
derived  organelle  of  Plasmodium,  DOXP  metabolism  provides  a  novel  target  for  anti- 
malarial  drug  development. 

Validation  of  SAGE  data 

In  order  to  confirm  the  expression  data  in  asexual  stage  parasites  as  determined  by 
SAGE,  RT-PCR  and  Northern  analysis  of  several  genes  with  highly  abundant  SAGE  tag 
counts  (calmodulin,  msp-1,  rap-1,  and  pfg27/25-  see  Figure  4)  were  performed.  Pfg27/25 
represents  a  gametocyte-specific  antigen,  while  the  other  three  are  predicted  to  be 
expressed  in  asexual  stages.  As  the  SAGE  library  was  derived  from  a  culture  that 
contained  no  detectable  gametocytes,  pfg27/25  was  specifically  chosen  for  RT-PCR  and 
Northern  analysis.  RT-PCR  products  for  all  four  genes  were  generated  from  asexual  stage 
mRNA  (Figure  4A).  These  were  cloned,  sequenced  and  found  to  correspond  to  the 
expected  gene.  Transcripts  at  the  predicted  length  for  all  four  genes  were  also  detected  by 
Northern  blotting  (data  not  shown,  see  Figure  4B) 

For  a  more  quantitative  estimate  of  gene  expression,  quantitative  Northern  analysis  of 
two  highly  expressed  genes  ( msp-3  and  calmodulin)  was  performed  (data  not  shown. 


Figure  4B).  Here  the  molar  ratio  of  msp-3  to  calmodulin  was  approximately  3:1,  which  is 
similar  to  the  ratio  of  their  SAGE  tag  counts  (Figure  4B).  Hence,  SAGE  tag  data  appears 
to  correlate  well  with  levels  of  mRNA  within  the  cells. 

Anti-sense,  transcripts 

A  surprising  observation  of  SAGE  in  P.  falciparum  was  the  large  proportion  of 
tags  corresponding  to  anti-sense  transcripts.  Unlike  microarrays,  SAGE  is  able  to  detect 
anti-sense  transcription  since  the  orientation  of  the  SAGE  tag  on  the  mRNA  can  be  readily 
determined.  A  SAGE  tag  consist  of  the  4bp  recognition  sequence  (CATG)  of  the 
restriction  enzyme,  NlalTI  (this  enzyme  defines  the  position  of  each  tag  in  a  mRNA 
transcript)  and  lObp  of  adjacent  sequence  in  the  direction  of  the  3’  poly  A  tail  of  the  RNA 
molecule.  Among  45  annotated  genes  whose  5’  and  3’  ends  are  clearly  denoted,  17%  of 
the  tags  consisted  of  a  CATG  and  the  3’  adjacent  lObp,  in  the  direction  of  the  5’  end  of 
the  transcript,  on  the  non-coding  strand  of  cDNA.  This  result  was  unexpected;  hence,  we 
wanted  independent  confirmation  of  the  SAGE  data.  This  was  accomplished  by  strand 
specific  RT-PCR  analysis  of  asexual  as  well  as  sexual  blood  stages,  and  strand  specific 
Northern  analysis  in  erythrocytic  stage  parasites. 

We  confirmed  the  presence  of  anti-sense  transcripts  from  erythrocytic  stages  by 
strand-specific  RT-PCR  analysis  of  three  genes,  calmodulin,  rap-1  and  msp-3.,  and 
subsequent  sequencing  of  the  RT-PCR  products  to  establish  gene  identity.  Based  on 
SAGE  data,  all  three  transcripts  were  expected  to  be  present  in  both  the  sense  and  anti- 
sense  orientations,  a  prediction  that  was  confirmed  by  RT-PCR  (Figure  5  A,  lanes  5-16) 
and  sequence  analysis.  Although  a  PCR  product  was  detected  for  pfg27/25  anti-sense 
RNA  (Figure  5  A,  lanel),  the  sequence  of  the  PCR  product  did  not  correspond  to 
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pfg27/25,  consistent  with  the  absence  of  an  anti-sense  SAGE  tag  for  this  gene. 

Importantly,  control  experiments  that  excluded  reverse  transcriptase  (lanes 
2,4,6,8,10,12,14,16)  indicated  a  lack  of  contaminating  genomic  DNA,  showing  that  the 
PCR  products  obtained  during  strand-specific  RT-PCR  were  indeed  derived  from  RNA. 
These  data  validate  the  anti-sense  transcripts  predicted  by  SAGE. 

The  presence  of  anti-sense  transcripts  was  also  confirmed  by  strand-specific 
Northern  analysis  for  calmodulin  and  msp-3.  Figure  5B  shows  that  strand-specific  probes 
can  specifically  detect  synthetic  anti-sense  RNA  (lanes  1  and  2  for  calmodulin;  lanes  7  and 
8  for  msp-3 )  or  synthetic  sense  RNA  (lanes  4  and  5  for  calmodulin;  lanes  10  and  1 1  for 
msp-3).  Using  these  strand-specific  probes,  total  RNA  isolated  from  asexual  stage 
parasites  was  shown  to  contain  both  anti-  sense  and  sense  transcripts  for  both  calmodulin 
(lanes  3  and  6)  and  msp-3  (lanes  9  and  12).  Therefore,  as  confirmed  by  two  independent 
techniques,  the  presence  of  anti-sense  tags  in  the  SAGE  library  reflects  anti-sense 
transcription  in  asexual  stages  of  the  malarial  parasite. 

We  wondered  whether  genes  expressed  in  other  stages  of  the  Plasmodium  life 
cycle  also  exhibited  anti-sense  transcription.  To  address  this,  the  sexual  stages  (zygotes 
and  ookinetes)  of  the  chicken  malarial  parasite,  P.  gallinaceum ,  were  tested  for  the 
presence  of  anti-sense  RNAs.  Pgs28  is  a  major  surface  antigen  of  P.  gallinaceum  sexual 
stages  (Duffy  et  al.,  1993)  and  transcription  of  the  pgs28  gene  has  been  studied  previously 
(29).  Strand-specific  RT-PCR  of  total  RNA  from  zygotes  (0  hours)  and  mature  ookinetes 
(48  hours)  showed  that  the  pgs28  gene  expressed  both  sense  and  anti-sense  transcripts 
(Figure  6)  at  different  stages  of  in  vitro  development. 


Discussion 


This  report  demonstrates  the  application  of  SAGE  in  P.  falciparum.  Despite  the 
low  complexity  of  the  .genome,  SAGE  tags  as  short  as  14bp  can  uniquely  identify  a 
majority  of  genes  in  P.  falciparum.  This  observation  has  been  exploited  to  study 
transcription  in  the  asexual  stages  of  the  parasite,  resulting  in  new  insights  into  the  biology 
of  the  pathogen.  First,  we  provide  a  description  of  the  transcriptional  profile  of  the  3D7 
strain  of  P.  falciparum  that  builds  upon  the  extensive  data  generated  by  the  Malaria 
Genome  Project.  Second,  the  major  metabolic  pathways  present  in  blood  stage  parasites 
are  delineated;  modulation  of  these  pathways  in  response  to  stimuli  like  drug-  and 
immune-pressure  can  now  be  studied.  And  finally,  this  report  shows  that  Plasmodium 
parasites  express  anti-sense  RNAs  at  multiple  stages  during  the  developmental  cycle,  a 
finding  that  has  implications  for  transcriptional  regulation  of  Plasmodium  gene  expression. 
Analysis  of  SAGE  tags 

Of  the  tags  that  matched  to  single  loci,  70%  matched  to  known  genes  while  30% 
matched  to  unknown  genes  or  hypothetical  proteins.  This  distribution  is  in  stark  contrast 
to  genome  sequencing  data  where  60%  of  the  putative  ORFs  were  of  unknown  function 
while  40%  were  genes  encoding  proteins  of  known  functions  (Gardner  et  al.,  1998).  This 
discrepancy  could  be  explained  by  the  fact  that  the  asexual  blood  stages  are  more 
amenable  to  cultivation  and  experimental  manipulation  in  the  laboratory  than  other  stages; 
hence,  many  of  the  transcripts  expressed  in  these  stages  have  been  previously  studied  and 
are  of  known  functions.  It  is  also  likely  that  a  majority  of  the  transcripts  expressed  during 
laboratory  culture  of  asexual  blood  stages  encode  proteins  that  serve  housekeeping 
functions  conserved  within  organisms  widely  separated  on  the  phylogenetic  tree.  The 
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genes  of  unknown  function  identified  by  the  Malaria  Genome  Project  may  turn  out  to  be 
of  importance  in  host-parasite  interactions  and  disease;  however,  under  culturing 
conditions  only  relatively  few  may  be  expressed  at  high  levels.  Moreover,  as  SAGE  data 
reveals  genes  that  are  actually  expressed  in  asexual  stage  parasites,  identification  of  tags 
that  correspond  to  unknown  genes  and  hypothetical  proteins  will  be  of  tremendous  use  in 
annotation  of  the  P.  falciparum  genome. 

Some  tags  (10%)  did  not  match  to  the  Plasmodium  databases.  As  the  P.  falciparum 
genome  is  80-90%  complete,  these  tags  should  prove  to  be  informative  as  the  genome 
project  proceeds  to  completion.  Alternatively,  tags  that  do  not  match  genome  sequence 
may  turn  out  to  span  splice  junctions.  These  questions  should  be  resolved,  as  more 
genome  sequence  becomes  available.  Nevertheless,  SAGE  in  P.  falciparum  is  comparable 
to  other  studies  where  tags  with  no  matches  to  the  genome  were  as  high  as  20% 
(Matsumura  et  al.,  1999)  and  23%  (Yamashita  et  al.,  2000)  of  the  total  tags. 

Finally,  of  the  8335  tags,  18%  gave  multiple  matches  to  Plasmodium  databases,  a 
number  that  is  four-fold  higher  than  that  obtained  from  human  pancreatic  SAGE  libraries, 
where -5%  of  tags  gave  multiple  matches  (Velculescu  et  al.,  1995).  However,  pancreatic 
SAGE  tags  were  only  searched  against  RNA  sequence  databases,  in  contrast  to  our  more 
extensive  analysis  that  surveyed  all  available  Plasmodium  genome  sequence.  Hence,  the 
higher  percentage  of  multiple  matches  to  the  genome  may  reflect  the  method  of  analysis 
rather  than  any  limitation  of  the  technique  when  applied  to  the  A-T  rich  genome  of 
Plasmodium.  Alternatively,  the  higher  percentage  of  tags  giving  multiple  matches  may  be 
a  consequence  of  the  lower  complexity  of  the  Plasmodium  genome.  Ambiguous  tags  of 
interest  can  be  investigated  further  on  an  individual  basis  by  Northern  analysis. 


Metabolic  pathways  defined  by  SAGE 

Other  reports  on  SAGE  have  revealed  metabolic  profiles  that  are  highly  specific  to 
the  organism  or  tissue  under  study.  For  example,  SAGE  of  mouse  kidney  revealed  a 
preponderance  of  ion  channels  and  mitochondrial  enzymes,  consistent  with  the  role  of  the 
kidney  in  filtration  and  solute  transport  and  the  high  energy  requirement  for  the  same  (El- 
Meanawy  et  al.,  2000).  Transcriptional  profiling  of  the  100  most  abundant  SAGE  tags 
derived  from  seedlings  of  the  rice  plant,  Oryza  sativa  L.,  demonstrated  a  prevalence  of 
prolamin,  a  storage  protein  expressed  in  seeds  (Matsumura  et  al.,  1999).  Not 
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unexpectedly,  other  highly  abundant  transcripts  included  those  encoding  water  channels 
and  respiratory  metabolism  enzymes. 

SAGE  data  from  P.  falciparum  sheds  light  on  the  transcriptional  profile  of  blood 
stage  parasites  and  hence  reveals  the  classes  of  proteins  and  metabolic  pathways  that  are 
utilized  during  asexual  growth.  For  example,  membrane-associated  proteins  form  the  most 
abundant  category  of  expressed  proteins.  This  is  not  surprising  in  light  of  the  fact  that  the 
parasite  is  separated  from  its  extracellular  environment  by  three  separate  membranes:  the 
host  red  blood  cell  membrane,  the  parasitophorous  vacuole  membrane,  and  the  parasite 
plasma  membrane.  Many  of  these  highly  expressed  proteins  are  stage  specific  and  have 
been  previously  shown  to  be  important  in  invasion  of  the  red  blood  cell  (merozoite  surface 
proteins-3  and  -4);  others  are  transporters  that  may  import  nutrients  into  the  parasite  cell 
(importin  (3-subunit).  Hence,  the  unique  niche  of  the  malarial  parasite  within  the  red  blood 
cell  requires  the  high  expression  of  specific  surface  proteins. 

A  significant  proportion  (7.4%)  of  the  most  abundant  tags  were  derived  either 
from  transcripts  encoded  on  the  6kb  mitochondrial  genome  or  from  nuclear  encoded 
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transcripts  involved  in  redox  metabolism.  High  levels  of  RNA  synthesis  from  the  6kb 
element  may  reflect  the  fact  that  this  episomally  replicating  molecule  is  present  at 
approximately  20  copies  per  cell  (Preiser  et  al.,  1996).  However,  the  high  abundance  of 
nuclear  encoded  transcripts  also  involved  in  redox  metabolism  (thioredoxin,  ubiquinol 
cytochrome  c-reductase  like  protein)  indicates  that  a  large  proportion  of  the  cells 
metabolic  activities  involve  the  maintenance  of  intracellular  oxidative  homeostasis. 
Moreover,  SAGE  data  show  that  transcripts  encoding  the  molecular  chaperones  Hsp-60 
and  -70,  which  may  be  involved  in  import  of  nuclear  encoded  proteins  into  the 
mitochondria  (Das  et  al.,  1997),  are  also  expressed  at  high  levels.  Hence,  mitochondrial 
functions  are  most  highly  represented  in  the  abundant  classes  of  SAGE  tags,  likely 
reflecting  the  micro-aerophilic  lifestyle  of  the  parasite  within  the  red  blood  cell.  The  robust 
expression  of  genes  involved  in  mitochondrial' physiology  may  explain  why  mitochondrial 
pathways  have  been  excellent  targets  for  anti-malarial  drugs. 

The  major  transcriptional  pathways  in  the  parasite  as  revealed  by  SAGE  will  help 
to  identify  potential  drug  targets  and  lead  compounds.  For  example,  atovaquone  inhibits 
erythrocytic  growth  by  targeting  the  mitochondrial  cytochrome  bci  complex  (Fry  and 
Pudney,  1992).  Further  evidence  that  other  highly  expressed  metabolic  pathways  could 
also  serve  as  drug  targets  is  found  in  the  following  studies:  the  anti-malarial  drug 
fosmidomycin  has  been  shown  to  target  DOXP  metabolism  (Jomaa  et  al.,  1999);  the 
ornithine  decarboxylase  inhibitor,  difluoro-methylornithine,  inhibits  erythrocytic  growth  of 
P.  falciparum  in  culture  (Assaraf  et  al.,  1984);  and  folate  antagonists  like  pyrimethamine 
and  cycloguanil  target  dihydrofolate  reductase  (Ferone  et  al.,  1969).  Other  major 


transcriptional  patterns  uncovered  by  SAGE  in  the  parasite  (proteasome,  chaperones, 
unknown  ORF,  etc.)  may  provide  new  targets  for  anti-malarial  drug  development. 

SAGE  reveals  novel  transcriptional  phenomena  in  P  falciparum 

Most  techniques  for  global  analysis  of  gene  expression  are  unable  to  distinguish 
sense  and  anti-sense  transcripts.  Due  to  the  directional  nature  of  SAGE  tags  (3’  most 
Nlain  site  of  each  transcript  (4bp)  and  lObp  downstream  on  the  coding  strand)  we  were 
able  to  identify  numerous  anti-sense  transcripts  in  the  transcriptional  repertoire  of  P. 
falciparum  asexual  stage  parasites.  Strand  specific  RT-PCR  and  Northern  analysis 
confirmed  this  observation  for  three  of  the  genes  {msp-3,  rap-1  and  calmodulin)  predicted 
to  transcribe  anti-sense  messages.  The  fact  that  anti-sense  transcription  can  be  detected  in 
Plasmodium  by  three  independent  methods  suggests  that  this  a  bona  fide  biological 
phenomenon  and  not  an  artifact  of  the  SAGE  procedure.  It  is  of  interest  to  note  that  anti- 
sense  transcripts  in  genes  that  contain  introns  are  larger  than  their  corresponding  sense 
RNAs  suggesting  that  sequences  complementary  to  introns  are  present  in  the  former. 

Our  data  also  demonstrates  that  anti-sense  transcripts  are  expressed  in  other  stages 
of  Plasmodium  development.  The  pgs28  gene  that  encodes  a  major  surface  antigen  of  P. 
gallinaceum  sexual  stages  has  been  studied  extensively  (Duffy  et  al.,  1993).  Transcription 
of pgs28  is  restricted  to  the  zygotes  and  ookinetes.  Strand-specific  RT-PCR  shows  that 
pgs28  expresses  both  sense  and  anti-sense  transcripts  in  both  stages.  Hence,  the  presence 
of  anti-sense  transcripts  may  be  a  widespread  phenomenon  in  multiple  stages  of 
Plasmodium  development  and  should  be  tested  further.  For  example,  a  family  of  genes 
iyar)  encodes  variable  surface  proteins  involved  in  host-parasite  interactions;  var  genes 
are  transcribed  during  erythrocytic  growth  resulting  in  the  expression  of  the  PfEMP-1 
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protein  (Wahlgren  et  al.,  1999).  Several  var  genes  are  transcribed  in  the  ring  stages  while 
a  single  var  gene  is  transcribed  in  trophozoites  (Chen  et  al.,  1998;  Scherf  et  al.,  1998).  It 
would  be  interesting  to  test  whether  any  of  the  ring  stage  var  transcripts  are  anti-sense. 

Anti-sense  transcripts  may  reflect  mechanisms  of  transcriptional  initiation  in  a 
parasite  with  a  highly  A-T  rich  genome  (86%  A-T  in  non-coding  regions  and  76%  in 
coding  sequence)  (Bowman  et  al.,  1999).  Numerous  studies  have  shown  that  transcription 
in  P.  falciparum  is  initiated  from  the  A-T  rich  5’  upstream  region  of  genes  resulting  in 
sense  transcripts  (Dechering  et  al.,  1999;  Horrocks  et  al.,  1998;  Horrocks  and  Lanzer, 
1999).  Sense  and  anti-sense  message  derived  from  genes  that  do  not  contain  introns  were 
approximately  the  same  size,  suggesting  that  transcription  may  also  initiate  from  the 
3 ’downstream  intergenic  region  of  genes.  The  presence  of  anti-sense  transcripts  for  17% 
of  annotated  genes  implies  novel  mechanisms  of  transcriptional  initiation  and  termination, 
including  potential  roles  in  post-transcriptional  control  of  protein  expression. 

In  conclusion,  we  have  shown  that  SAGE  can  be  readily  adapted  for  the  study  of 
global  transcription  in  Plasmodium  falciparum.  SAGE  of  3D7  asexual  parasites  sheds 
light  on  the  prominent  metabolic  pathways  utilized  in  these  stages.  Since  blood  stages  are 
the  targets  of  both  anti-malarial  drugs  and  the  host  immune  system,  this  comprehensive 
transcriptional  profile  generated  by  SAGE  will  form  the  basis  for  future  comparisons  of 
gene  expression  under  drug  or  immune  pressure.  Finally,  the  unique  nature  of  SAGE 
reveals  novel  phenomena,  that  of  anti-sense  transcription  which  has  previously  been 
missed. 
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Figure  legends 


Figure  1:  BLASTx  analysis  of  highly  abundant  SAGE  tags.  SAGE  tags  from  the  3D7 
library  were  analyzed  using  the  SAGE  software  (as  described  in  the  methods  section). 
SAGE  tags  at  an  abundance  level  of  5  or  greater  were  further  analyzed  as  depicted  in  the 
flow  chart.  The  percentage  of  single  matches,  no  matches  and  multiple  matches  to  the 
databases  are  indicated.  Numbers  in  brackets  correspond  to  the  proportion  of  tags  counts 
in  each  group.  Single  matches  were  divided  into  tags  that  matched  Annotated  and  Un¬ 
annotated  sequence  reads.  Tags  in  the  latter  group  were  characterized  by  BLASTx.  ,14bp 

* 

tags  that  failed  to  match  either  database  were  analyzed  using  only  the  first  13bp  of  the  tag 
sequence. 

Fig  2:  SAGE  3D7  library  analysis.  A.  Cumulative  total  gene  representation  -within  the 
3D7  SAGE  library.  Ascertained  tags  (from  the  3D7  library)  at  increasing  increments  were 
plotted  against  the  number  of  unique  genes  from  which  the  tag  subsets  were  derived.  The 
solid  line  corresponds  to  all  ascertained  tags.  The  dotted  line  corresponds  to  the  number 
of  unique  genes  represented  by  tags  at  an  abundance  level  of  2  or  greater  (some  tags  at  an 
abundance  level  of  1  may  be  derived  from  sequencing  errors).  B.  SAGE  abundance 
classes.  8335  SAGE  tags  are  divided  into  abundance  classes.  The  number  of  unique  tags 
matching  an  entry  in  the  P. falciparum  NCBI  database  is  listed  per  abundance  class  (last 
column),  and  the  percentage  of  hits  within  the  abundance  class  is  given  in  brackets. 

Fig  3  :  Categories  of  highly  expressed  genes  in  the  3D7  control  population.  Highly 
abundant  tags  (percentage  frequency  of  greater  than  0.05)  were  examined  for  their 
matches  in  P. falciparum  databases  (see  analysis  described  in  figure  1)  and  categorized  by 
putative  functions.  The  number  of  genes  in  each  category  is  depicted. 
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Figure  4  :  Validation  of  SAGE  data  by  RT-PCR,  and  Northern  analysis  A.  RT-PCR 
of  genes  represented  in  the  3D7  SAGE  library.  RT-PCR  products  generated  by  specific 
primers  for  pfg27/25  (lane  1  and  2),  rap-1  (lane  3  and  4),  msp-3  (lane  5  and  6)  and 
calmodulin(lane  7  and  8)  are  shown.  RT  minus  samples  were  also  PCR  amplified  as 
negative  controls  (lanes  2,4,6,and  8).  pBR322  MspI  digest  (lane  M).  B.  Summary  of 
expression  data.  For  Northern  analysis  lpg  of  mRNA  from  3D7  cultures  was  gel 
electrophoresed,  blotted  onto  a  nylon  membrane  and  probed  with  gene  specific  32P~labeled 
DNA  probes.  +  indicates  that  a  specific  signal  corresponding  to  each  transcript  was 
detected  in  the  northern  analysis.  Tag  counts  from  the  3D7  library  (8335  tags)  for  all  four 
genes  are  listed.  Quantitative  northern  analysis  was  carried  out  for  calmodulin  and  msp-3. 
Northern  blots  and  gene  specific  probes  were  prepared  as  described  above.  Known 
amounts  of  in-vitro  synthesized  RNA  fragments  were  run  alongside  the  mRNA  sample  as 
markers  for  quantification.  Signal  intensities  for  each  of  the  transcripts  in  the  mRNA 
sample  were  converted  to  molar  amounts  by  reference  to  those  of  the  synthetic  RNAs. 

Fig  5:  Validation  of  anti-sense  SAGE  data  by  stand  specific  RT-PCR  and  strand 
specific  northern  analysis  in  asexual  stage  parasites.  A.  Strand  specific  RT-PCR 
analysis.  First  strand  cDNA  from  asexual  stages  was  synthesized  with  a  primer  that 
specifically  hybridizes  to  either  anti-sense  message  (A)  for pfg27/25  (lane  1  and  2),  rap-1 
(lane  5  and  6),  msp-3  (lane  9  and  10)  and  calmodulin  (lane  13  and  14);  or  a  primer  that 
binds  to  sense  message  (S)  for pfg27/25  (lane  3  and  4),  rap-1  (lane  7  and  8),  msp-3  (lane 
11  and  12)  and  calmodulin  (lane  15  and  16).  RT-PCR  was  performed  on  these  cDNA 
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samples  using  the  same  primer  pair  for  pfg27/25  (lane  1-4),  rap- 1  (lane  5-8),  msp-3  (9-12) 
and  calmodulm(lane  13-16).  Products  in  lanes  1,3,5, 7,9, ll,13,and  15  were  cloned  to 
confirm  gene  identity.  "+"  indicates  that  the  gel  product  corresponded  to  the  specific  gene 
under  investigation.  Only  the  product  in  lane  1  was  generated  by  non-specific  PCR 
amplification,  indicated  by  RT  minus  samples  were  also  PCR  amplified  as  negative 
controls  (lanes  2,4,6,  8,10,  12,14,  and  16).  pBR322  MspI  digest  (lanes  M).  Genes  for 
which  anti-sense  transcripts  were  found  in  the  SAGE  library  are  indicated  by  "yes"  or 
"no"  (last  row).  B.  Strand  specific  northern  analysis.  Synthetic  sense  (S)  RNA  fragments 
(lane  1  and  4  for  calmodulin;  lane  7  and  10  for  msp-3),  or  synthetic  anti-sense  (A)  RNA 
fragments  (lane  2  and  5  for  calmodulin;  lane  8  and  11  for  msp-3),  or  20  pg  of  total  RNA 
from  jD7  cultures  (lanes  3,6,9  and  12)  were  gel  electrophoresed.  Blots  were  transferred 
onto  nitrocellulose  membranes  and  probed  with  32P  labeled  sense  (S)  RNA  probes 
(calmodulin:  lane  1-3;  msp-3:  lane  7-9)  or  anti-sense  (A)  RNA  probes  (calmodulin:  lane  4- 
6;  msp-3:  lane  10-12). 

Fig  6:  Strand  specific  RT-PCR  analysis  of pgs28  in  sexual  stage  parasites.  First  strand 
cDNA  from  sexual  stages  was  synthesized  with  a  primer  that  specifically  hybridizes  to 
either  anti-sense  message  (A)  for pgs28  (lane  l,2,5,6,9,and  10);  or  a  primer  that  bmds  to 
sense  message  (S)  for  pgs28  (lane  3,4,7,8,1 1,12).  cDNA  was  synthesized  from  total  RNA 
obtained  from  purified  gametes  and  zygotes  immediately  (lane  1-4),  24  hours  (lane  5-8) 
and  48  hours  (lane  9-12)  after  isolation.  RT-PCR  was  performed  on  all  samples  using  the 
same  primer  pair.  RT  minus  samples  were  also  PCR  amplified  as  negative  controls  (lanes 
2,4,6,  8,10,  12).  pBR322  MspI  digest  (lanes  M). 
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Search  for  14bp  tag  on  NCBI  database  and  compiled  database  of  P. falciparum  genomic  sequence 


- - ! - 1  Single  match  No  match 
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Hypothetical  proteins  -Predicted  proteins 
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known  proteins 
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Number  of  genes 
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counts: 
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(femtomoles  of  transcript) 
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confirmed 
by  sequence: 
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Table  I.  Highly  expressed  genes  in  the  3D7  library 


Tag 

%  Abundance 

Gene  description 

TCAGGCGTTA 

1.18 

mitochondrial  6Kb  product 

GTGGTGGTGC 

0.70 

no  match  to  database 

GAGCAAGCAG 

0.58 

unknown  protein 

C-AAGTCGAAA 

0.44 

5.8S  ribosomal  RNA 

ATTTGAAGCA 

0.36 

Rhop  H3 

CTAAAGCACC 

0.28 

ras  -related  nuclear  protein 

TTGAAGCTGA 

0.26 

heat  shock  protein-70 

AACGACAAGA 

0.25 

Pfg27/25 

CCAAATGATG 

0.25 

polyubiquitin 

TACAGCTGCT 

0.18 

merozoite  surface  protein-3 

GGAAATAAAG 

0.17 

tartarate  resistant  acid  phosphatase 

TGAGTCAAAC 

0.17 

no  match  to  database 

GGCACAACTA 

0.17 

thioredoxin 

TAAACTTTTG 

0.16 

rhoptry-associated  protein- 1 

TTGTTTCATA 

0.09 

rifin 

CGAGTAAAAG 

0.09 

1-deoxy-D-xylulose  5 -phosphate  synthase 

CCAACTAAGG 

0.07 

ATPase  subunit  B 

TGATGGCTTG 

0.07 

ornithine  decarboxylase 

TTCCGAACTT 

0.07 

triose  phosphate  isomerase 

AGAGATCCGC 

0.06 

ubiquinol  cytochrome-c  reductase  like  protein 

GATACTCTTG 

0.06 

26S  proteasome  beta  subunit 

Tag  represents  the  lObp  SAGE  tag  sequence  adjacent  to  the  Nlalll  site. 
Abundance  is  listed  as  a  percentage  of  all  8335  tags  in  the  SAGE  library 


